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EXPRESSION IN FILAMENTOUS FUNGI OF PRO TEASE INHIBITORS AND VARIANTS 



FIELD OF THE INVENTION 

[Oil This invention relates to methods for the expression of protease inhibitors and 
variants thereof in filamentous fungi. The invention discloses fusion nucleic acids, vectors, 
fusion polypeptides, and processes for obtaining the protease inhibitors. 

BACKGROUND OF THE INVENTION 

[021 Proteases are involved in a wide variety of biological processes. Disruption of the 
balance between proteases and protease inhibitors is often associated with pathologic tissue 
destruction. 

[031 Various studies have focused on the role of proteinases in tissue injury, and it is 
thought that the balance between proteinases and proteinase inhibitors is a major 
determinant in maintaining tissue integrity. Serine proteinases from inflammatory cells, 
including neutrophils, are implicated in various inflammatory disorders, such as pulmonary 
emphysema, arthritis, atopic dermatitis and psoriasis. 

[04[ Proteases also appear to function in the spread of certain cancers. Normal cells 
exist in contact with a complex protein network, called the extracellular matrix (ECM). The 
ECM is a barrier to cell movement and cancer cells must devise ways to break their 
attachments, degrade, and move through the ECM in order to metastasize. Proteases are 
enzymes that degrade other proteins and have long been thought to aid in freeing the tumor 
cells from their original location by chewing up the ECM. Recent studies have suggested 
that they may promote cell shape changes and motility through the activation of a protein in 
the tumor cell membrane called Protease-Activated Receptor-2 (PAR2). This leads to a 
cascade of intracellular reactions that activates the motility apparatus of the cell. Thus, it is 
hypothesized that one of the first steps in tumor metastasis is a reorganization of the cell 
shape such that it forms a distinct protrusion at one edge facing the direction of migration. 
The cell then migrates through a blood vessel wall travels to distal locations, eventually 
reattaching and forming a metastatic tumor. For example, human prostatic epithelial cells 
constitutively secrete prostate-specific antigen (PSA), a kallikrein-like serine protease, which 
is a normal component of the seminal plasma. The protease acts to degrade the 
extracellular matrix and facilitate invasion of cancerous cells. 
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[05] Synthetic and natural protease inhibitors have been shown to inhibit tumor promotion 
in vivo and in vitro. Previous research investigations have indicated that certain protease 
inhibitors belonging to a family of structurally-related proteins classified as serine protease 
inhibitors or SERPINS, are known to inhibit several proteases including trypsin, cathepsin G, 
thrombin, tissue kallikrein, as well as neutrophil elastase. The Serpins are extremely 
effective at preventing/suppressing carcinogen-induced transformation in vitro and 
carcinogenesis in animal model systems. Systemic delivery of purified protease inhibitors 
reduces joint inflammation and cartilage and bone destruction as well. 
[06] Topical administration of protease inhibitors finds use in such conditions as atopic 
dermatitis, a common form of inflammation of the skin, which may be localized to a few 
patches or involve large portions of the body. The depigmenting activity of protease 
inhibitors and their capabilityto prevent ultraviolet-induced pigmentation have been 
demonstrated both in vitro and in vivo. Paine et al., Journal of Investigative Dermatology 
116, 587-595 (2001). Also, protease inhibitors have been found to help wound healing 
rhttn //www.scienr. ft ri a ilv.com/relea^s/2000/10/001002071718.htm ). Secretory leukocyte 
protease inhibitor was demonstrated to reverse the tissue destruction and speed the wound 
healing process when applied topically. In addition, serine protease inhibitors can also help 
to reduce pain in lupus erythematosus patients (See US Patent No. 6537968). 
[07] As noted above, protease inhibitors interfere with the action of proteases. Naturally 
occurring protease inhibitors can be found in a variety of foods such as cereal grains (oats, 
barley, and maize), Brussels sprouts, onion, beetroot, wheat, finger millet, and peanuts. 
One source of interest is the soybean. The average level in soybeans is around 1.4 percent 
and 0.6 percent for Kunitz and Bowman-Birk respectively, two of the most important 
protease inhibitors. These low levels make it impractical to isolate the natural protease 
inhibitor for clinical applications. 

[08] Thus, there is a need for a method to produce large quantities of protease inhibitors 
and their variants that also reduces or eliminates the risk associated with blood-borne 
infectious agents when these agents are produced in mammalian tissue culture cells. The 
inventive production method provided for herein allows for the manufacture of large 
quantities of the protein therapeutic. 

BRIEF SUMMARY OF THE INVENTION 

[09] Provided herein are nucleic acids, cells and methods for the production of protease 
inhibitors and variants thereof. 
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[10] In a first embodiment, nucleic acids encoding a functional protease inhibitor are 

provided. In one aspect, a nucleic acid comprising regulatory sequences operatively linked 

to a first, second, third and fourth nucleic acid sequences are provided. Terminator 

sequences are provided following the fourth nucleic acid sequence. 

[11] In a second aspect, the first nucleic acid sequence encodes a signal polypeptide 

functional as a secretory sequence in a first filamentous fungus, the second nucleic acid 

encodes a secreted polypeptide or functional portion thereof normally secreted from said first 

or a second filamentous fungus, the third nucleic acid encodes a cleavable linker and the 

fourth nucleic acid encodes a protease inhibitor or fragment thereof. 

[12] In a third aspect, an expression cassette comprising nucleic acid sequences 

encoding a protease inhibitor is provided. 

[13] In fourth aspect the present invention relates to a polynucleotide encoding a protease 
inhibitor variant. The polynucleotide may encode a Bowman-Birk Inhibitor variant wherein at 
least one loop has been altered. The polynucleotide may encode a Soybean Trypsin 
Inhibitor variant wherein at least one loop has been altered. 

[14] In a second embodiment, methods of expressing a functional protease inhibitor or 
variant thereof are provided. In one aspect, a host cell is (i) transformed with an expression 
cassette comprising a nucleic acid sequence encoding a protease inhibitor or variant thereof, 
and (ii) cultured under appropriate conditions to express the protease inhibitor or variants 
thereof. Optionally, the method further comprises recovering the protease inhibitor or variant 
thereof. 

[1S[ In a second aspect, a host cell is (i) transformed with an first expression cassette 
comprising a nucleic acid sequence encoding a protease inhibitor or variant thereof, (ii) 
transformed with a second expression cassette comprising a nucleic acid sequence 
encoding a chaperone, and (iii) cultured under appropriate conditions to express the 
protease inhibitors or variant thereof. Optionally, the protease inhibitors or variant thereof 
may be recovered. In one aspect, the protease inhibitors or variant thereof are expressed as 
a fusion protein. Optionally, the method further comprises recovering the protease inhibitor 
or variant thereof. 

[16] In a third embodiment, cells capable of expressing a protease inhibitor or variant 
thereof is provided. Host cells are transformed an expression cassette encoding a protease 
inhibitor or variant thereof. Host cells may be selected from the group consisting of 
Aspergillus and Trichoderma. 
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[17] In a fourth embodiment, a functional protease inhibitor or variant thereof is provided. 
In one aspect, the functional protease inhibitor or variant thereof is expressed as a fusion 
protein consisting of the glucoamylase signal sequence, prosequence, catalytic domain and 
linker region up to amino acid number 502 of mature glucoamylase, followed by amino acids 
NVISKR and then by the mature protease inhibitor or variant thereof. 
[18] In a second aspect, the expressed proteins are treated with a protease to liberate a 
protease inhibitor or variant thereof from the fusion protein. 

[19] In a third aspect, the present invention provides a polypeptide having protease 
inhibitory activity, selected from the group consisting of 

a) Bowman-Birk Inhibitor variants; 

b) Soybean Trypsin Inhibitor variants; 

c) Bowman-Birk inhibitor; 

d) Soybean Trypsin Inhibitor; and 

e) A scaffold comprising at least one variant sequence. 

[20] Other objects, features and advantages of the present invention will become 
apparent from the following detailed description. It should be understood, however, that the 
detailed description and specific examples, while indicating preferred embodiments of the 
invention, are given by way of illustration only, since various changes and modifications 
within the scope and spirit of the invention will become apparent to one skilled in the art from 
this detailed description. 

BRIEF DESCRIPTION OF THE DRAWINGS 

[21] Figure 1 is the codon optimized nucleotide sequence for soybean Bowman-Birk type 
protease inhibitor (BBI) (SEQ ID NO:1). This sequence includes nucleotides encoding 
NVISKR (dotted underline), the cleavage site for the fusion protein and three restriction 
enzyme sites for cloning into the expression plasmid. The Nhe\ site at the 5' end and Xho\ 
site at the 3' end are underlined and labeled. The BstEU site at the 3' end is designated by 
the # symbols. The stop codon is designated by the asterisks. There is no start codon as 
this is expressed as a fusion protein. The mature BBI coding sequence is indicated by the 
double underline (SEQ ID NO:2). The addition of nucleotides encoding three glycine (Figure 
1B) residues prior to the mature BBI coding sequence can be done using the sequence 
encoding the three glycine residues indicated in Figure 2 (SEQ ID NO:5). Figure 1C 
nucleotide sequence encoding BBI, the three restriction sites, the kex2 site, three glycine 
residues at the N-terminal end and six histidine residues at the C-terminal end is shown 
(SEQ ID NO:76). 
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[22] Figure 2 is the codon optimized nucleotide sequence for Soybean Trypsin Inhibitor 
(STI), a Kunitz type protease inhibitor (SEQ ID NO:3). This sequence includes nucleotides 
encoding NVISKR (dotted underline) (SEQ ID NO:4), the cleavage site for the fusion protein, 
and six histidine residues at the C-terminal end (indicated by the dots). Three restriction 
enzyme sites (Nhe\ at 5' end and Xho\ and BstEW at 3' end, indicated as described for Figure 
1 ) for cloning into the expression plasmid were also included. The three glycine residues 
after the kex2 site (NVISKR) are indicated by bold. The nucleotide sequence encoding the 
mature STI is indicated by the dashed underline (SEQ IN NO:6). 

[231 Figure 3A is the mature amino acid sequence for BBI (SEQ ID NO:7). Figure 3B is 
BBI with three glycine residues at N-terminal (SEQ ID NO:8). Figure 3C is BBI with three 
glycine residues at N-terminal end and six histidine residues at C-terminal end (SEQ ID 
NO:9). In Figures 3A-C Loopl is indicated by the underlined amino acid residues and Loop 
II amino acid residues are indicated by the bold type. 

[24[ Figure 4A is the mature amino acid sequence for STI (SEQ ID NO:10). Figure 4B is 
STI with three glycine residues at the N-terminal end (SEQ ID NO:1 1 ). Figure 4C is STI with 
three glycine residues at the N-terminus and with six histidine residues at the C-terminus 
(SEQ ID NO:12). Loopl is indicated by the underlined amino acid residues (SEQ ID 
NO:13). Loop II amino acid residues are indicated by the bold type (SEQ ID NO:14). 
[25] Figure 5 is a diagram of the expression plasmid pSLGAMpR2-BBI. This plasmid is 
based on pSLGAMpR2 which is derived from pSL1 180 by inserting the A. niger 
glucoamylase promoter, catalytic core and terminator, a marker gene (A niger pyrG) and a 
bovine prochymosin gene. The pSL1180 plasmid is available from Amersham Biosciences 
(Piscataway, NJ). The P SLGAMpR2 plasmid has the elements listed above inserted in the 
same relative location as shown for pSLGAMpR2-BBI except that the bovine prochymosin 
gene is located where the BBI gene. Thus, the BBI gene replaces the prochymosin gene in 
P SLGAMpR2 to yield pSLGAMpR2-BB I . 

[26] Figure 6 is the amino acid sequences for wild-type BBI (SEQ ID NO:7) and select 
variants of BBI (SEQ ID NOs:15 thru 29). The wild-type BBI has the loops underlined. The 
differences in the variants from the wild-type are shown as either bold/underlined (Loop I) or 
bold (Loopll). In some variants, e.g., C2, C3, C4, C5 and Factor B, alanine at positions 
(between two cysteines) was also changed to either "Serine", "Glycine" or "Glutamine". 
Also, compstatin peptide has 9 amino acids instead of 7. The variant sequences are also 
shown (SEQ ID NOs:30 thru 40). 
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[271 Figure 7 is a photograph of an agarose gel. Lane 1 contains molecular weight 
markers. Lane 2 is the untransformed parental strain. Lane 3 is the parental strain 
transformed with BBI-encoding DNA. Lane 4 is the parental strain co-transformed with a 
BBI-encoding vector and a chaperone (pdiA)-encoding vector. Lane 15 is the parental strain 
co-transformed with a BBI-encoding vector and a chaperone (prpA)-encoding vector. 
Expression of the desired protein, e.g., BBI, was enhanced in the presence of the 
chaperone. 

[28] Figure 8 is a diagram of the plasmid pTrex4. 

[29] Figure 9 A-D is the nucleic acid sequence for pTrex2 (SEQ ID NO:41 ). 

DETAILED DESCRIPTION 

[301 The invention will now be described in detail by way of reference only using the 
following definitions and examples. All patents and publications, including all sequences 
disclosed within such patents and publications, referred to herein are expressly incorporated 
by reference. 

[31] Unless defined otherwise herein, all technical and scientific terms used herein have 
the same meaning as commonly understood by one of ordinary skill in the art to which this 
invention belongs. Singleton, etal., Dictionary of Microbiology and Molecular 
Biology, 2D Ed., John Wiley and Sons, New York (1994), and Hale & Marham, The Harper 
Collins Dictionary of Biology, Harper Perennial, NY (1991 ) provide one of skill with a 
general dictionary of many of the terms used in this invention. Although any methods and 
materials similar or equivalent to those described herein can be used in the practice or 
testing of the present invention, the preferred methods and materials are described. 
Numeric ranges are inclusive of the numbers defining the range. Unless otherwise 
indicated, nucleic acids are written left to right in 5' to 3' orientation; amino acid sequences 
are written left to right in amino to carboxy orientation, respectively. Practitioners are 
particularly directed to Sambrook ef a/., 1989, and Ausubel FM ef a/., 1993, for definitions 
and terms of the art. It is to be understood that this invention is not limited to the particular 
methodology, protocols, and reagents described, as these may vary. 
[32] The headings provided herein are not limitations of the various aspects or 
embodiments of the invention which can be had by reference to the specification as a whole. 
Accordingly, the terms defined immediately below are more fully defined by reference to the 
specification as a whole. 
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DEFINITIONS 

[33] An "expression cassette" or "expression vector" is a nucleic acid construct generated 
recombinantly or synthetically, with a series of specified nucleic acid elements that permit 
transcription of a particular nucleic acid in a target cell. The recombinant expression 
cassette can be incorporated into a plasmid, chromosome, mitochondrial DNA, plastid DNA, 
virus, or nucleic acid fragment. Typically, the recombinant expression cassette portion of an 
expression vector includes, among other sequences, a nucleic acid sequence to be 
transcribed and a promoter. Expression cassette may be used interchangeably with DNA 
construct and its grammatical equivalents. 

[34] As used herein, the term "vector" refers to a nucleic acid construct designed for 
transfer nucleic acid sequences into cells. An "expression vector refers to a vector that has 
the ability to incorporate and express heterologous DNA fragments in a foreign cell. Many 
prokaryotic and eukaryotic expression vectors are commercially available. Selection of 
appropriate expression vectors is within the knowledge of those having skill in the art. 
[35] As used herein, the term "plasmid" refers to a circular double-stranded (ds) DNA 
construct used as a cloning vector, and which forms an extrachromosomal self-replicating 
genetic element in some eukaryotes or integrates into the host chromosomes. 
[36] The term "nucleic acid molecule" or "nucleic acid sequence" includes RNA, DNA and 
cDNA molecules. It will be understood that, as a result of the degeneracy of the genetic 
code, a multitude of nucleotide sequences encoding a given protein may be produced. 
[37] As used herein, a "fusion DNA sequence" comprises from 5' to 3' a first, second, third 
and fourth DNA sequences. 

[38] As used herein, "a first nucleic acid sequence" or "first DNA sequence" encodes a 
signal peptide functional as a secretory sequence in a first filamentous fungus. Such signal 
sequences include those from glucoamylase, a-amylase and aspartyl proteases from 
Aspergillus niger var. awamori, Aspergillus niger, Aspergillus oryzae, signal sequences from 
cellobiohydrolase I, cellobiohydrolase II, endoglucanase I, endoglucanase III from 
Trichoderma, signal sequences from glucoamylase from Neurospora and Humicola as well 
as signal sequences from eukaryotes including the signal sequence from bovine chymosin, 
human tissue plasminogen activator, human interferon and synthetic consensus eukaryotic 
signal sequences such as that described by Gwynne era/. (1987) Bio/Technology 5, 713- 
719. Particularly preferred signal sequences are those derived from polypeptides secreted 
by the expression host used to express and secrete the fusion polypeptide. For example, 
the signal sequence from glucoamylase from Aspergillus niger is preferred when expressing 
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and secreting a fusion polypeptide from Aspergillus niaer . As used herein, first amino acid 
sequences correspond to secretory sequences which are functional in a filamentous fungus. 
Such amino acid sequences are encoded by first DNA sequences as defined. 
[39] As used herein, "second DNA sequences" encode "secreted polypeptides" normally 
expressed from filamentous fungi. Such secreted polypeptides include glucoamylase, a- 
amylase and aspartyl proteases from Aspergillus niger var. awamori, Aspergillus niger, and 
Aspergillus oryzae, cellobiohydrolase I, cellobiohydrolase II, endoglucanase I and 
endoglucanase III from Trichoderma and glucoamylase from Neurospora species and 
Humicola species. As with the first DNA sequences, preferred secreted polypeptides are 
those which are naturally secreted by the filamentous fungal expression host. Thus, for 
example when using Aspergillus niger, preferred secreted polypeptides are glucoamylase 
and a-amylase from Aspergillus niger, most preferably glucoamylase. In one aspect the 
glucoamylase is greater than 95%, 96%, 97%, 98% or 99% homologous with an Aspergillus 
glucoamylase. 

[40] When Aspergillus glucoamylase is the secreted polypeptide encoded by the second 
DNA sequence, the whole protein or a portion thereof may be used, optionally including a 
prosequence. Thus, the cleavable linker polypeptide may be fused to glucoamylase at any 
amino acid residue from position 468 - 509. Other amino acid residues may be the fusion 
site but utilizing the above residues is particularly advantageous. 

[41] A "functional portion of a secreted polypeptide" or grammatical equivalents means a 
truncated secreted polypeptide that retains its ability to fold into a normal, albeit truncated, 
configuration. For example, in the case of bovine chymosin production by A. niger var. 
awamori it has been shown that fusion of prochymosin following the 1 1th amino acid of 
mature glucoamylase provided no benefit compared to production of preprochymosin (US 
patent 5,364,770). In USSN 08/318,494, it was shown that fusion of prochymosin onto the 
C-terminus of preproglucoamylase up to the 297th amino acid of mature glucoamylase plus 
a repeat of amino acids 1-11 of mature glucoamylase yielded no secreted chymosin in A. 
niger var. awamori. In the latter case it is unlikely that the portion (approximately 63%) of the 
glucoamylase catalytic domain present in the fusion protein was able to fold correctly so that 
an aberrant, mis-folded and/or unstable fusion protein may have been produced which could 
not be secreted by the cell The inability of the partial catalytic domain to fold correctly may 
have interfered with the folding of the attached chymosin. Thus, it is likely that sufficient 
residues of a domain of the naturally secreted polypeptide must be present to allow it to fold 
in its normal configuration independently of the desired polypeptide to which it is attached. 
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[42] In most cases, the portion of the secreted polypeptide will be both correctly folded 
and result in increased secretion as compared to its absence. 

[43] Similarly, in most cases, the truncation of the secreted polypeptide means that the 
functional portion retains a biological function. In a preferred embodiment, the catalytic 
domain of a secreted polypeptide is used, although other functional domains may be used, 
for example, the substrate binding domains. In the case of Aspergillus niger and Aspergillus 
niger var. awamori glucoamylase, preferred functional portions retain the catalytic domain of 
the enzyme, and include amino acids 1-471 . Additionally preferred embodiments utilize the 
catalytic domain and all or part of the linker region. Alternatively, the starch binding domain 
of glucoamylase may be used, which comprises amino acids 509-61 6 of Aspergillus niger 
and Aspergillus niger var. awamori glucoamylase. 

|44] As used herein, "third DNA sequences" comprise DNA sequences encoding a 
cleavable linker polypeptide. Such sequences include those which encode the prosequence 
of glucoamylase, the prosequence of bovine chymosin, the prosequence of subtilisin, 
prosequences of retroviral proteases including human immunodeficiency virus protease and 
DNA sequences encoding amino acid sequences recognized and cleaved by trypsin, factor 
X a collagenase, clostripin, subtilisin, chymosin, yeast KEX2 protease, Aspergillus KEXB and 
the like. Seee.g Marston F.A.O. (1986) BjoLChemJ, 240, 1-12. Such third DNA 
sequences may also encode the amino acid methionine that may be selectively cleaved by 
cyanogen bromide. It should be understood that the third DNA sequence need only encode 
that amino acid sequence which is necessary to be recognized by a particular enzyme or 
chemical agent to bring about cleavage of the fusion polypeptide. Thus, the entire 
prosequence of, for example, glucoamylase, chymosin or subtilisin need not be used. 
Rather, only that portion of the prosequence which is necessary for recognition and cleavage 
by the appropriate enzyme is required. 

[45] It should be understood that the third nucleic acid need only encode that amino acid 
sequence which is necessary to be recognized by a particular enzyme or chemical agent to 
bring about cleavage of the fusion polypeptide. 

[46] Particularly preferred cleavable linkers are the KEX2 protease recognition site (Lys- 
Arg), which can be cleaved by a native Aspergillus KEX2-like (KEXB) protease, trypsin 
protease recognition sites of Lys and Arg, and the cleavage recognition site for 
endoproteinase-Lys-C. 

[47] As used herein, "fourth DNA sequences" encode "desired polypeptides." Such 
desired polypeptides include protease inhibitors and variants thereof. 
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[481 The above-defined four DNA sequences encoding the corresponding four amino acid 
sequences are combined to form a "fusion DNA sequence." Such fusion DNA sequences 
are assembled in proper reading frame from the 5' terminus to 3' terminus in the order of 
first, second, third and fourth DNA sequences. As so assembled, the DNA sequence will 
encode a "fusion polypeptide" or "fusion protein" or "fusion analog" encoding from its amino- 
terminus a signal peptide functional as a secretory sequence in a filamentous fungus, a 
secreted polypeptide or portion thereof normally secreted from a filamentous fungus, a 
cleavable linker polypeptide and a desired polypeptide. 

[491 As used herein, the terms "desired protein" or "desired polypeptide" refers to a 
polypeptide or protein in its mature form that is not fused to a secretion enhancing construct. 
Thus, a "desired protein" or "desired polypeptide" refers to the protein to be expressed and 
secreted by the host cell in a non-fused form. 

[501 As used herein, a "fusion polypeptide" or "fusion protein" or "fusion analog" encodes 
from its amino-terminus a signal peptide functional as a secretory sequence functional in a 
host cell, a secreted polypeptide or portion thereof normally secreted from a host cell, a 
cleavable linker polypeptide and a desired polypeptide. The fusion protein may be 
processed by host cell enzymes, e.g., a protease, to yield the desired protein free from the 
other protein sequences in the fusion protein. As used herein, the terms "fusion analog" or 
"fusion polypeptide" or "fusion protein" may be used interchangeably. 
|5ll As used herein, a "promotor sequence" is a DNA sequence which is recognized by 
the particular filamentous fungus for expression purposes. It is operably linked to a DNA 
sequence encoding the above defined fusion polypeptide. Such linkage comprises 
positioning of the promoter with respect to the translation initiation codon of the DNA 
sequence encoding the fusion DNA sequence. The promoter sequence contains 
transcription and translation control sequences which mediate the expression of the fusion 
DNA sequence. Examples include the promoter from the A. niger var. awamori or A. niger 
glucoamylase genes (Nunberg, J.H. etaL (1984) Mol. Cell. Biol. 4, 2306-2315; Boel. E. etaL 
(1984) EMBOi 3, 1581-1585), the A. oryzae, A. niger var. awamori or A. niger or alpha- 
amylase genes, the Rhizomucor miehei carboxyl protease gene, the Trichoderma reesei 
cellobiohydrolase I gene (Shoemaker, S.P. etaL (1984) European Patent Application No. 
EPO0137280A1), the A. nidulans trpC gene (Yelton, M. etaL (1984) Prnr.. Natl. Acad. Sci. 
USA 81 , 1470-1474; Mullaney, E.J. etaL (1985) Mol. Gen. Genet. 199, 37-45) the A. 
nidulans alcA gene (Lockington, RA. etaL (1986) Gene 33 137-149), the A. nidulans amdS 
gene (McKnight, G.L. etaL (1986) CM 46, 143-147), the A. nidulans amdS gene (Hynes, 
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M.J. etaL (1983) Mol. Cell Biol. 3, 1430-1439), and higher eukaryotic promoters such as the 
SV40 early promoter (Barclay, S.L. and E. Meller (1983) Molecular and Cellular Biology 3, 
2117-2130). 

[52] Likewise a "terminator sequence" is a DNA sequence which is recognized by the 
expression host to terminate transcription. It is operably linked to the 3* end of the fusion 
DNA encoding the fusion polypeptide to be expressed. Examples include the terminator 
from the A. nidulans trpC gene (Yelton, M. etaL (1984) Proc. Natl. Acad. Sci. USA 81. 1470- 
1474; Mullaney, E.J. etaL (1985) Moj Pen. Genet. 199 . 37-45), the A. nigervar. awamori or 
A. niger glucoamylase genes (Nunberg, J.H. etaL (1984) Mol. Cell. Biol. 4, 2306-253; Boel, 
E. etaL (1 984) EMBO J. 3, 1581-1585), the A. oryzae, A. nigervar. awamoriorA. nigeror 
alpha-amylase genes and the Rhizomucor miehei carboxyl protease gene (EPO Publication 
No. 0 215 594), although any fungal terminator is likely to be functional in the present 
invention. 

[53] A "polyadenylation sequence" is a DNA sequence which when transcribed is 
recognized by the expression host to add polyadenosine residues to transcribed mRNA. It is 
operably linked to the 3' end of the fusion DNA encoding the fusion polypeptide to be 
expressed. Examples include polyadenylation sequences from the A. nidulans trpC gene 
(Yelton, M. etaL (1984) Proc. Natl. Acad. Sci. USA 81, 1470-1474; Mullaney, E.J. etaL 
(1985) Mol. Gen. Genet. 199 , 37-45), the A. nigervar. awamori or A. niger glucoamylase 
genes (Nunberg, J.H. etaL (1984) Mol. Cell. Biol. 4, 2306-2315) (Boel, E. etaL (1984) 
EMBOJ. 3, 1581-1585), the A. oryzae, A. nigervar. awamoriorA. niger or alpha-amylase 
genes and the Rhizomucor miehei carboxyl protease gene described above. Any fungal 
polyadenylation sequence, however, is likely to be functional in the present invention. 
[541 As used herein, the term "selectable marker-encoding nucleotide sequence" refers to 
a nucleotide sequence which is capable of expression in fungal cells and where expression 
of the selectable marker confers to cells containing the expressed gene the ability to grow in 
the presence of a corresponding selective condition. 

[551 A nucleic acid is "operably linked" when it is placed into a functional relationship with 
another nucleic acid sequence. For example, DNA encoding a secretory leader is operably 
linked to DNA for a polypeptide if it is expressed as a preprotein that participates in the 
secretion of the polypeptide; a promoter or enhancer is operably linked to a coding sequence 
if it affects the transcription of the sequence; or a ribosome binding site is operably linked to 
a coding sequence if it is positioned so as to facilitate translation. Generally, "operably 
linked" means that the DNA sequences being linked are contiguous, and, in the case of a 
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secretory leader, contiguous and in reading phase. However, enhancers do not have to be 
contiguous. Linking is accomplished by ligation at convenient restriction sites. If such sites 
do not exist the synthetic oligonucleotide adaptors or linkers are used in accordance with 
conventional practice. 

[56] As used herein, "recombinant' includes reference to a cell or vector, that has been 
modified by the introduction of a heterologous nucleic acid sequence or that the cell is 
derived from a cell so modified. Thus, for example, recombinant cells express genes that 
are not found in identical form within the native (non-recombinant) form of the cell or express 
native genes that are otherwise abnormally expressed, under expressed or not expressed at 
all as a result of deliberate human intervention. 

[57] As used herein, the term "expression" refers to the process by which a polypeptide is 
produced based on the nucleic acid sequence of a gene. The process includes both 
transcription and translation. It follows that the term "protease inhibitor expression" refers to 
transcription and translation of the specific protease inhibitors and variants thereof gene to 
be expressed, the products of which include precursor RNA, mRNA, polypeptide, post- 
translation processed polypeptide, and derivatives thereof. Similarly, "protease inhibitor 
expression" refers to the transcription, translation and assembly of protease inhibitors and 
variants thereof into a form exemplified by Figure 6. By way of example, assays for protease 
inhibitor expression include examination of fungal colonies when exposed to the appropriate 
conditions, western blot for protease inhibitor protein, as well as northern blot analysis and 
reverse transcriptase polymerase chain reaction (RT-PCR) assays for protease inhibitor 
mRNA. 

[58] As used herein the term "glycosylated" means that oligosaccharide molecules have 
been added to particular amino acid residues on a protein. A "de-glycosylated" protein is a 
protein that has been treated to partially or completely remove the oligosaccharide 
molecules from the protein. An "agiycosylated" protein is a protein that has not had the 
oligosaccharide molecules added to the protein. This may be due to a mutation in the 
protein that prevents the addition of the oligosaccharide. 

[59] A "non-glycosylated" protein is a protein that does not have the oligosaccharide 
attached to the protein. This may be due to various reasons, including but not limited to, the 
absence of enzymes responsible for the addition of the oligosaccharides to proteins. The 
term "non-glycosylated" encompasses both proteins that have not had the oligosaccharide 
added to the protein and those in which the oligosaccharides have been added but were 
subsequently removed. An "agiycosylated" protein may be a "non-glycosylated" protein. A 
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"non-glycosylated" protein may be either an '^glycosylated" protein or a "deglycosylated" 
protein. 

[60] The terms "isolated" or "purified" as used herein refer to a nucleic acid or polypeptide 

that is removed from at least one component with which it is naturally associated 

[61] The term "substantially free" includes preparations of the desired polypeptide having 

less than about 20% (by dry weight) other proteins (i.e., contaminating protein), less than 

about 10% other proteins, less than about 5% other proteins, or less than about 1% other 

proteins. 

[62] The term "substantially pure" when applied to the proteins or fragments thereof of the 
present invention means that the proteins are essentially free of other substances to an 
extent practical and appropriate for their intended use. In particular, the proteins are 
sufficiently pure and are sufficiently free from other biological constituents of the host cells so 
as to be useful in, for example, protein sequencing, or producing pharmaceutical 
preparations. 

[63] The term "target protein" as used herein refers to protein, e.g., an enzyme, hormone 
or the like, whose action would be blocked by the binding of the variant inhibitors provided 
for herein. 

[64] The terms "variant sequence" or "variant sequences" refer to the short polypeptide 
sequence(s) that replace the binding loops of the wild-type protease inhibitor or other 
scaffold. The variant sequence does not need to be of the same length as the binding loop 
sequence it is replacing in the scaffold. 

[65] The term "scaffold" refers to the wild-type protein sequence into which a variant 
sequence may be introduced. In an embodiment the scaffold will have portions, e.g., loops, 
that may be replaced. For example, the STl and BBl sequences used herein would be a 
scaffold for a variant sequence. 

PROTEASE INHIBITORS 

[66] Two protein protease inhibitors have been isolated from soybeans, the Kunitz-type 
trypsin inhibitor (soybean trypsin inhibitor, STl) and the Bowman-Birk protease inhibitor 
(BBl). See, e.g., Birk, Int. J. Pept. Protein Res. 25:1 13-131 (1985) and Kennedy, Am. J. Clin. 
Neutr. 68:1406S-1412S (1998). These inhibitors serve as a scaffold for the variant 
sequences. 
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[67] In addition, to alterations in the scaffold comprising the variant sequences, other 
desired proteins used herein include the addition of three glycine residues at the N-terminal 
and/or six histidine residues at the C-terminal. See Figures 3 and 4. 
Soybean Trypsin Inhibitor (STI) 

[68] STI inhibits the proteolytic activity of trypsin by the formation of a stable 
stoichiometric complex. See, e.g., Liu, K., Chemistry and Nutritional value of soybean 
components. In: Soybeans, chemistry, technology and utilization, pp. 32-35 (Aspen 
publishers, Inc., Gaithersburg, Md., 1999). STI consists of 181 amino acid residues with two 
disulfide bridges and is roughly spherically shaped. See, e.g., Song et al., J. Mol. Biol. 
275:347-63 (1998). The two disulfide bridges form two binding loops similar to those 
described below for BBI. 

[69] The Kunitz-type soybean trypsin inhibitor (STI) has played a key role in the early 
study of proteinases, having been used as the main substrate in the biochemical and kinetic 
work that led to the definition of the standard mechanism of action of proteinase inhibitors. 
Bowman-Blrk Inhibitor (BBI) 

[70] BBI proteins are a kinetically and structurally well-characterized family of small 
proteins (60-90 residues) isolated from leguminous seeds. They have a symmetrical 
structure of two tricyclic domains each containing an independent binding loop. Loop I 
typically inhibits trypsin and loop II chymotrypsin (Chen etal., J. Biol. Chem. (1992) 
267:1990-1994; Werner & Wemmer, 1992; Lin et al., Eur. J. Biochem. (1993) 212:549-555; 
Voss et al., Eur. J. Biochem. (1996) 242:122-131 ). These binding regions each contain a 
"canonical loop" structure, which is a motif found in a variety of serine proteinase inhibitors 
(Bode & Huber, Eur. J. Biochem. (1992) 204:433-451). 

[71] BBI is an 8 k-Da protein that inhibits the proteases trypsin and chymotrypsin at 
separate reactive sites. See, e.g., Billings et al., Pro. Natl. Acad. Sci. 89:3120-3124 (1992). 
STI and BBI are found only in the soybean seed, and not in any other part of the plant. See, 
e.g., Birk, Int. J. Pept. Protein Res. 25:113-131 (1985). 

[72] Although numerous isoforms of BBI have been characterized, SEQ ID NO: 7 (Figure 
3) shows the amino acid sequence of the BBI backbone used herein comprising 
approximately 71 amino acid residues. In addition, BBI may become truncated with as many 
as 10 amino acid residues being removed from either the N- or C- terminal. For example, 
upon seed desiccation, a BBI many have the C-terminal 9 or 10 amino acid residues 
removed. Thus, proteolysis is highly tolerated prior to the initial disulphide and just after the 
terminal disulphide bond, the consequences of which are usually not detrimental to the 
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binding to target protein. However, it will be appreciated that any one of the isoforms. or . 
truncated forms could be used. 
Protease Inhibitor Variants 

[73] As noted above, the STI and BBI protease inhibitors have binding loops that inhibit 
proteases. The inventive protease inhibitor variants provided for herein have alterations in 
Loop I, Loop II or both loops. In an embodiment, the loops are replaced with sequences that 
interact with a target protein. 

[74] The loops can be replaced with sequences derived from VEGF binding proteins, 
inhibitors of the complement pathway such as C2, C3, C4 or C5 inhibitors, cotton binding 
proteins, Compstatin and the like. Alternatively, variant sequences can be selected by 
various methods known in the art such as, for example, phage display or other screening 
method. For example, a random peptide gene library is fused with phage PIN gene so the 
peptide library will be displayed on the surface of the phage. Subsequently, the phage 
display library is exposed to the target protein and washed with buffer to remove non-specific 
binding (this process is sometimes referred to as panning). Finally, the binding phage and 
PCR the DNA sequence for the peptide encoded are isolated. 

[75] Generally, a loop will be replaced with a variant sequence, i.e., peptides, 3 to 14 
amino acids in length, 5 to 10 amino acids being preferred. Longer sequences may be used 
as long as they provide the binding and/or inhibition desired. In addition, peptides suitable 
for use as replacements of the binding loop(s) should adopt a functional conformation when 
contained within a constrained loop, i.e., a loop formed by the presence to a disulfide bond 
between two cysteine residues. In specific embodiments, the peptides are between 7 and 9 
amino acids in length. These replacement sequences also provide protease inhibition or 
binding to the targeted proteins. 

[76] In some cases it may be advantages to alter a single amino acid. Specifically, the 
Alanine at residue 13 of wild-type STI or BBI may be changed to a Serine, a Glycine or a 
Glutamine. 

FUSION PROTEINS 

[77] Each protease inhibitor and variant thereof will be expressed as a fusion protein by 
the host fungal cell. Although cleavage of the fusion polypeptide to release the desired 
protein will often be useful, it is not necessary. Protease inhibitors and variants thereof 
expressed and secreted as fusion proteins surprisingly retain their function. 
[78] The above-defined four DNA sequences encoding the corresponding four amino acid 
sequences are combined to form a "fusion DNA sequence." Such fusion DNA sequences 
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are assembled in proper reading frame from the 5" terminus to 3' terminus in the order of 
first, second, third and fourth DNA sequences. As so assembled, the DNA sequence will 
encode a "fusion polypeptide" encoding from its amino-terminus a signal peptide functional 
as a secretory sequence in a filamentous fungus, a secreted polypeptide or portion thereof 
normally secreted from a filamentous fungus, a cleavable linker peptide and a desired 
polypeptide, e.g., a protease inhibitor and variants thereof. 

[79] Production of fusion proteins can be accomplished by use of the methods disclosed 
in, for example, US Patents 5,41 1 ,873, 5,429,950, and 5,679,543. Other methods are well 
known in the art. 

EXPRESSION OF RECOMBINANT A PR OTFASE INHIBITOR 
[80] To the extent that this invention depends on the production of fusion proteins, it relies 
on routine techniques in the field of recombinant genetics. Basic texts disclosing the general 
methods of use in this invention include Sambrook ef a/., Molecular Cloning, A Laboratory 
Manual (2nd ed. 1989); Kriegler, Gene Transfer and Expression: A Laboratory Manual 
(1990); and Ausubel etal., eds., Current Protocols in Molecular Biology (1994). 
[81] This invention provides filamentous fungal host cells which have been transduced, 
transformed or transfected with an expression vector comprising a protease inhibitor- 
encoding nucleic acid sequence. The culture conditions, such as temperature, pH and the 
like, are those previously used for the parental host cell prior to transduction, transformation 
or transfection and will be apparent to those skilled in the art. 

[82] Basically, a nucleotide sequence encoding a fusion protein is operably linked to a 
promoter sequence functional in the host cell. This promoter-gene unit is then typically 
cloned into intermediate vectors before transformation into the host cells for replication 
and/or expression. These intermediate vectors are typically prokaryotic vectors, e.g., 
plasmids, or shuttle vectors. 

[83] In one approach, a filamentous fungal cell line is transfected with an expression 
vector having a promoter or biologically active promoter fragment or one or more (e.g., a 
series) of enhancers which functions in the host cell line, operably linked to a nucleic acid 
sequence encoding a protease inhibitor, such that the a protease is expressed in the cell 
line. In a preferred embodiment, the DNA sequences encode a protease inhibitor or variant 
thereof. In another preferred embodiment, the promoter is a regulatable one. 
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A. Condon Optimization 

[84] Optimizing codon usage in genes that express well with those genes that do not 
express well is known in the art. See Barnett et al., GB22001 18 and Bergquist et al., 
Extremophiles (2002) 6:177-184. Codon optimization, as used herein, was based on 
comparing heterologous proteins that are expressed well in Aspergillus and native secreted 
proteins to the heterlogous proteins that are not expressed well. See Table I. 



Table I: 



Proteins that expressed well 


Proteins that did not express well 


glucoamylase 
alpha-amylase 
stachybotrys laccase A 
stachybotrys laccase B 
human trypsin 
SCCE 

bovine prochymosin 
Her2 antibodies light chain 


Human DPPIV 
NEP 



(85] selected codons that were not used or not used often in the expressed proteins will 
be changed to codons that were used often. Therefore, we only changed a subset of 
codons. 

B. Nucleic Acid Constructs/Expre ssion Vectors. 

[86] Natural or synthetic polynucleotide fragments encoding a protease inhibitor ("Pl- 
encoding nucleic acid sequences") may be incorporated into heterologous nucleic acid 
constructs or vectors, capable of introduction into, and replication in, a filamentous fungal 
cell. The vectors and methods disclosed herein are suitable for use in host cells for the 
expression of a protease inhibitor and variants thereof. Any vector may be used as long as it 
is replicable and viable in the cells into which it is introduced. Large numbers of suitable 
vectors and promoters are known to those of skill in the art, and are commercially available. 
Appropriate cloning and expression vectors for use in filamentous fungal cells are also 
described in Sambrook et al., 1989, and Ausubel FM et al., 1989, expressly incorporated by 
reference herein. The appropriate DNA sequence may be inserted into a plasmid or vector 
(collectively referred to herein as "vectors") by a variety of procedures. In general, the DNA 
sequence is inserted into an appropriate restriction endonuclease site(s) by standard 
procedures. Such procedures and related sub-cloning procedures are deemed to be within 
the scope of knowledge of those skilled in the art. 



17 



ATTORNEY DOCKET NO. GC815P 
PROVISIONAL PATENT APPLICATION 



[87] Appropriate vectors are typically equipped with a selectable marker-encoding nucleic 
acid sequence, insertion sites, and suitable control elements, such as termination 
sequences. The vector may comprise regulatory sequences, including, for example, non- 
coding sequences, such as introns and control elements, /.e., promoter and terminator 
elements or 5' and/or 3' untranslated regions, effective for expression of the coding 
sequence in host cells (and/or in a vector or host cell environment in which a modified 
soluble protein coding sequence is not normally expressed), operably linked to the coding 
sequence. Large numbers of suitable vectors and promoters are known to those of skill in 
the art, many of which are commercially available and/or are described in Sambrook, et a/., 
(supra). 

[88] Exemplary promoters include both constitutive promoters and inducible promoters, 
examples of which include a CMV promoter, an SV40 early promoter, an RSV promoter, an 
EF-1a promoter, a promoter containing the tet responsive element (TRE) in the tet-on or tet- 
off system as described (ClonTech and BASF), the beta actin promoter and the 
metallothionein promoter that can upregulated by addition of certain metal salts. In one 
embodiment of this invention, glaA promoter is used. This promoter is induced in the 
presence of maltose. Such promoters are well known to those of skill in the art. 
[89] Those skilled in the art are aware that a natural promoter can be modified by 
replacement, substitution, addition or elimination of one or more nucleotides without 
changing its function. The practice of the invention encompasses and is not constrained by 
such alterations to the promoter. 

[90] The choice of promoter used in the genetic construct is within the knowledge of one 
skilled in the art. 

[91] The choice of the proper selectable marker will depend on the host cell, and 
appropriate markers for different hosts are well known in the art. Typical selectable marker 
genes encode proteins that (a) confer resistance to antibiotics or other toxins, for example, 
ampicillin, methotrexate, tetracycline, neomycin (Southern and Berg, J., 1982), 
mycophenolic acid (Mulligan and Berg, 1980), puromycin, zeomycin, or hygromycin (Sugden 
et al., 1985) or (b) compliment an auxotrophic mutation or a naturally occurring nutritional 
deficiency in the host strain. In a preferred embodiment, a fungal pyrG gene is used as a 
selectable marker (Ballance, D.J. et al., 1983, Biochem. Biophys. Res. Commun. 1 12:284- 
289). In another preferred embodiment, a fungal amdS gene is used as a selectable marker 
(Tilburn, J. et al., 1983, Gene 26:205-221). 
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[92] A selected PI coding sequence may be inserted into a suitable vector according to 
well-known recombinant techniques and used to transform a cell line capable of PI 
expression. Due to the inherent degeneracy of the genetic code, other nucleic acid 
sequences which encode substantially the same or a functionally equivalent amino acid 
sequence may be used to clone and express a specific protease inhibitor, as further detailed 
above. Therefore it is appreciated that such substitutions in the coding region fall within the 
sequence variants covered by the present invention. Any and all of these sequence variants 
can be utilized in the same way as described herein for a parent Pl-encoding nucleic acid 
sequence. One skilled in the art will recognize that differing Pis will be encoded by differing 
nucleic acid sequences. 

[931 Once the desired form of a protease inhibitor nucleic acid sequence, homologue, 
variant or fragment thereof, is obtained, it may be modified in a variety of ways. Where the 
sequence involves non-coding flanking regions, the flanking regions may be subjected to 
resection, mutagenesis, etc. Thus, transitions, transversions, deletions, and insertions may 
be performed on the naturally occurring sequence. 

[94] Heterologous nucleic acid constructs may include the coding sequence for an 
protease inhibitor, or a variant, fragment or splice variant thereof: (i) in isolation; (ii) in 
combination with additional coding sequences; such as fusion protein or signal peptide 
coding sequences, where the PI coding sequence is the dominant coding sequence; (iii) in 
combination with non-coding sequences, such as introns and control elements, such as 
promoter and terminator elements or 5' and/or 3' untranslated regions, effective for 
expression of the coding sequence in a suitable host; and/or (iv) in a vector or host 
environment in which the PI coding sequence is a heterologous gene. 
[951 A heterologous nucleic acid containing the appropriate nucleic acid coding sequence, 
as described above, together with appropriate promoter and control sequences, may be 
employed to transform filamentous fungal cells to permit the cells to express a protease 
inhibitor or variant thereof. 

[961 In one aspect of the present invention, a heterologous nucleic acid construct is 
employed to transfer a Pl-encoding nucleic acid sequence into a cell in vitro, with 
established cell lines preferred. Preferably, cell lines that are to be used as production hosts 
have the nucleic acid sequences of this invention stably integrated. It follows that any 
method effective to generate stable transformants may be used in practicing the invention. 
[97] In one aspect of the present invention, the first and second expression cassettes may 
be present on a single vector or on separate vectors. 
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[98] The practice of the present invention will employ, unless otherwise indicated, 
conventional techniques of molecular biology, microbiology, and recombinant DNA, which 
are within the skill of the art. Such techniques are explained fully in the literature. See, for 
example, "Molecular Cloning: A Laboratory Manual", Second Edition (Sambrook, Fritsch & 
Maniatis! 1989), "Animal Cell Culture" (R. I. Freshney, ed., 1987); and "Current Protocols in 
Molecular Biology" (F. M. Ausubel of a/., eds., 1987). All patents, patent applications, 
articles and publications mentioned herein, both supra and infra, are hereby expressly 
incorporated herein by reference. 

[99] In addition to a promoter sequence, the expression cassette should also contain a 
transcription termination region downstream of the structural gene to provide for efficient 
termination. The termination region may be obtained from the same gene as the promoter 
sequence or may be obtained from different genes, also within the knowledge of one skilled 
in the art. 

[100] The particular expression vector used to transport the genetic information into the cell 
is not particularly critical. Any of the conventional vectors used for expression in eukaryotic 
or prokaryotic cells may be used. Standard bacterial expression vectors include 
bacteriophages X and M13, as well as plasmids such as P BR322 based plasmids, pSKF, 
pET23D, and fusion expression systems such as MBP, GST, and LacZ. Epitope tags can 
also be added to recombinant proteins to provide convenient methods of isolation, e.g.. c- 
myc. 

[1011 The elements that are typically included in expression vectors also include a replicon, 
a gene encoding antibiotic resistance to permit selection of bacteria that harbor recombinant 
plasmids, and unique restriction sites in nonessential regions of the plasmid to allow 
insertion of heterologous sequences. The particular antibiotic resistance gene chosen is not 
critical, any of the many resistance genes known in the art are suitable. 
C. Host Cells and Culture Conditions. 

[102] The present invention provides cell lines comprising cells which have been modified, 
selected and cultured in a manner effective to result in expression of a protease inhibitor and 
variants thereof. 

[103] Examples of parental cell lines which may be treated and/or modified for PI 
expression include, but are not limited to, filamentous fungal cells. Examples of appropriate 
primary cell types for use in practicing the invention include, but are not limited to, 
Aspergillus and Trichoderma. 
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[104] Protease inhibitor expressing cells are cultured under conditions typically employed 
to culture the parental cell line. Generally, cells are cultured in a standard medium 
containing physiological salts and nutrients, such as standard RPMI, MEM, IMEM or DMEM, 
typically supplemented with 5-10% serum, such as fetal bovine serum. Culture conditions 
are also standard, e.g., cultures are incubated at 37°C in stationary or roller cultures until 
desired levels of protease inhibitor expression are achieved. 
[105] Preferred culture conditions for a given cell line may be found in the scientific 
literature and/or from the source of the cell line such as the American Type Culture 
Collection (ATCC; "http://www.atcc.org/"). Typically, after cell growth has been established, 
the cells are exposed to conditions effective to cause or inhibit the expression of a protease 
inhibitor and variants thereof. 

[106] In the preferred embodiments, where a PI coding sequence is under the control of an 
inducible promoter, the inducing agent, e.g., a carbohydrate, metal salt or antibiotics, is 
added to the medium at a concentration effective to induce protease inhibitor expression. 
D. Introduction Of A Protease Inhibitor-Encoding Nucleic Ac id Sequence Into Host 
Cells. 

[107] The methods of transformation used may result in the stable integration of all or part 
of the transformation vector into the genome of the filamentous fungus. However, 
transformation resulting in the maintenance of a self-replicating extra-chromosomal 
transformation vector is also contemplated. 

[108] The invention further provides cells and cell compositions which have been 
genetically modified to comprise an exogenously provided Pl-encoding nucleic acid 
sequence. A parental cell or cell line may be genetically modified (i.e., transduced, 
transformed or transfected) with a cloning vector or an expression vector. The vector may 
be, for example, in the form of a plasmid, a viral particle, a phage, etc, as further described 
above. In a preferred embodiment, a plasmid is used to transfect a filamentous fungal cell. 
The transformations may be sequential or by co-transformation. 

[109] Various methods may be employed for delivering an expression vector into cells in 
vitro. Methods of introducing nucleic acids into cells for expression of heterologous nucleic 
acid sequences are also known to the ordinarily skilled artisan, including, but not limited to 
electroporation; nuclear microinjection or direct microinjection into single cells; protoplast 
fusion with intact cells; use of polycations, e.g., polybrene or polyornithine; or PEG. 
membrane fusion with liposomes, lipofectamine or lipofection-mediated transfection; high 
velocity bombardment with DNA-coated microprojectiles; incubation with calcium phosphate- 
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DNA precipitate; DEAE-Dextran mediated transfection; infection with modified viral nucleic 
acids; Agrobacferium-mediated transfer of DNA; and the like. In addition, heterologous 
nucleic acid constructs comprising a Pl-encoding nucleic acid sequence can be transcribed 
in vitro, and the resulting RNA introduced into the host cell by well-known methods, e.g., by 
injection. 

[110] Following introduction of a heterologous nucleic acid construct comprising the coding 
sequence for a protease inhibitor, the genetically modified cells can be cultured in 
conventional nutrient media modified as appropriate for activating promoters, selecting 
transformants or amplifying expression of a Pl-encoding nucleic acid sequence. The culture 
conditions, such as temperature, pH and the like, are those previously used for the host cell 
selected for expression, and will be apparent to those skilled in the art. 
[Ill] The progeny of cells into which such heterologous nucleic acid constructs have been 
introduced are generally considered to comprise the Pl-encoding nucleic acid sequence 
found in the heterologous nucleic acid construct. 
E. Fungal Expression 

[112] Appropriate host cells include filamentous fungal cells. The "filamentous fungi" of the 
present invention, which serve both as the expression hosts and the source of the first and 
second nucleic acids, are eukaryotic microorganisms and include all filamentous forms of the 
subdivision Eumycotina, Alexopoulos, C.J. (1962), Introductory Mycology, New York: Wiley. 
These fungi are characterized by a vegetative mycelium with a cell wall composed of chitin, 
glucans, and other complex polysaccharides. The filamentous fungi of the present invention 
are morphologically, physiologically, and genetically distinct from yeasts. Vegetative growth 
by filamentous fungi is by hyphal elongation. In contrast, vegetative growth by yeasts such 
as S. cerevisiae is by budding of a unicellular thallus. Illustrations of differences between S. 
cerevisiae and filamentous fungi include the inability of S. cerevisiae to process Aspergillus 
and Trichoderma introns and the inability to recognize many transcriptional regulators of 
filamentous fungi (Innis, M.A. etal. (1985) Science, 228, 21-26). 

[1131 Various species of filamentous fungi may be used as expression hosts including the 
following genera: Aspergillus, Trichoderma, Neurospora, Penicillium, Cephalosporium, 
Achlya, Phanerochaete, Podospora, Endothia, Mucor, Fusarium, Humicola, and 
Chrysosporium.. Specific expression hosts include A. nidulans, (Yelton, M., etal. (1984) 
Proc. Natl. Acad. Sci. USA, 81, 1470-1474; Mullaney, E.J. etal. (1985) Mol. Gen. Genet. 
199, 37-45; John, M.A. and J.F. Peberdy (1984) Enzyme Microb. Technol. 6, 386-389; 
Tilburn, ef al. (1982) Gene 26, 205-221 ; Ballance, D.J. et al., (1983) Biochem. Biophys. Res. 
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Comm. 112, 284-289; Johnston, I.L. etal. (1985) EMBO J. 4, 1307-1311) A. niger, (Kelly, 
J.M. and M. Hynes (1985) EMBO 4, 475-479) A. n/ger var. awamori, e.g., NRRL 3112, 
ATCC 22342, ATCC 44733. ATCC 14331 and strain UVK 143f, A. oryzae, e.g., ATCC 
1 1490, N. crassa (Case, M.E. ef al. (1979) Proc. Natl. Acad. Sci. USA 76, 5259-5263; 
Lambowitz U.S. Patent No. 4,486,553; Kinsey, J.A. and J.A. Rambosek (1984) Molecular 
and Cellular Biology 4, 1 17-122; Bull, J.H. and J.C. Wooton (1984) Nature 310, 701-704), 
Trichoderma reesei, e.g. NRRL 15709, ATCC 13631, 56764, 56765, 56466, 56767, and 
Trichoderma viride, e.g., ATCC 32098 and 32086. A preferred expression host is A. niger 
var. awamori in which the gene encoding the major secreted aspartyl protease has been 
deleted. The production of this preferred expression host is described in United States 
Patent Application Serial No. 214,237 filed July 1, 1988, expressly incorporated herein by 
reference. 

[114] During the secretion process in fungi, which are eukaryotes, the secreted protein 
crosses the membrane from the cytoplasm into the lumen of the endoplasmic reticulum 
(ER). It is here that the protein folds and disulphide bonds are formed. Chaperone proteins 
such as BiP and proteins like protein disulphide isomerase assist in this process. It is also at 
this stage where sugar chains are attached to the protein to produce a glycosylated protein. 
Sugars are typically added to asparagine residues as N-linked glycosylate or to serine or 
threonine residues as O-linked glycosylation. Correctly folded and glycosylated proteins 
pass from the ER to the Golgi apparatus where the sugar chains are modified and where the 
KEX2 or KEXB protease of yeast and fungi resides. The N-linked glycosylation added to 
secreted proteins produced in fungi differs from that added by mammalian cells. 
[115] Protease inhibitor and variants thereof produced by the filamentous fungal host cells 
may be either glycosylated or non-glycosylated (i.e., aglycosylated or deglycosylated). 
Because the fungal glycosylation pattern differs from that produced by mammalian cells, the 
protease inhibitor may be treated with an enzyme to deglycosylate the protease inhibitor 
Enzymes useful for such N-linked deglycosylation are endoglycosidase H, endoglycosidase 
F1, endoglycosidase F2, endoglycosidase A, PNGase F, PNGase A, and PNGase At. 
Enzymes useful for such O-linked deglycosylation are exoglycosidases, specifically alpha- 
mannosidases (e.g. alpha-Mannosidase {Aspergillus sa/'fo,/GKX-5009), alpha(1-2, 3, 6)- 
Mannosidase (Jack bean, GKX-5010) alpha-Mannosidase/MANase VI (recombinant from 
Xanthomonas manihoti, GKX80070) all from Glyko (Prozyme), San Leandro, California). 
[116] We have surprisingly found that high levels of a protease inhibitor and variants 
thereof can be made in fungi when fused to a native secreted protein. From the information 
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provided above it is clear that the protease inhibitor and variants thereof would be expected 
to assemble in the ER when glucoamylase was still attached to the N-termini. This would 
produce a large protein of greater than 56 kD. The glucoamylase would not be expected to 
be cleaved from the desired protein when it passed through the Golgi apparatus without 
further modification. 

[1171 Using the present inventive methods and host cells, we have attained surprising 
levels of expression. The system utilized herein has achieved levels of expression and 
secretion of greater than 0.5 g/l of protease inhibitor. . 

[118] After the expression vector is introduced into the cells, the transfected cells are 
cultured under conditions favoring expression of gene encoding the desired protein. Large 
batches of transformed cells can be cultured as described above. Finally, product is 
recovered from the culture using techniques known in the art. 
CHAPERONES 

[1191 As noted above, the folding and glycosylation of the secretory proteins in the ER is 
assisted by numerous ER-resident proteins called chaperones. The chaperones like Bip 
(GRP78), GRP94 or yeast Lhslp help the secretory protein to fold by binding to exposed 
hydrophobic regions in the unfolded states and preventing unfavourable interactions (Blond- 
Elguindi et al., 1993, CeH 75:717-728). The chaperones are also important for the 
translocation of the proteins through the ER membrane. The foldase proteins like protein 
disulphide isomerase (pdi) and its homologs and prolyl-peptidyl cis-trans isomerase assist in 
formation of disulphide bridges and formation of the right conformation of the peptide chain 
adjacent to proline residues, respectively. 

[1201 In one aspect of the invention the host cells are transformed with an expression 
vector encoding a chaperone. The chaperone is selected from the group consisting of pdiA 
and prpA. 

FERMENTATION PARAMETERS 

[121] The invention relies on fermentation procedures for culturing fungi. Fermentation 
procedures for production of heterologous proteins are known per se in the art. For example, 
proteins can be produced either by solid or submerged culture, including batch, fed-batch 
and continuous-flow processes. 

[122] Culturing is accomplished in a growth medium comprising an aqueous mineral salts 
medium, organic growth factors, the carbon and energy source material, molecular oxygen, 
and, of course, a starting inoculum of one or more particular microorganism species to be 
employed. 
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[123| In addition to the carbon and energy source, oxygen, assimilable nitrogen, and an 
inoculum of the microorganism, it is necessary to supply suitable amounts in proper 
proportions of mineral nutrients to assure proper microorganism growth, maximize the 
assimilation of the carbon and energy source by the cells in the microbial conversion 
process, and achieve maximum cellular yields with maximum cell density in the fermentation 
media. 

1124] The composition of the aqueous mineral medium can vary over a wide range, 
depending in part on the microorganism and substrate employed, as is known in the art. The 
mineral media should include, in addition to nitrogen, suitable amounts of phosphorus, 
magnesium, calcium, potassium, sulfur, and sodium, in suitable soluble assimilable ionic and 
combined forms, and also present preferably should be certain trace elements such as 
copper, manganese, molybdenum, zinc, iron, boron, and iodine, and others, again in suitable 
soluble assimilable form, all as known in the art. 

[125] The fermentation reaction is an aerobic process in which the molecular oxygen 
needed is supplied by a molecular oxygen-containing gas such as air, oxygen-enriched air. 
or even substantially pure molecular oxygen, provided to maintain the contents of the 
fermentation vessel with a suitable oxygen partial pressure effective in assisting the 
microorganism species to grow in a thriving fashion. In effect, by using an oxygenated 
hydrocarbon substrate, the oxygen requirement for growth of the microorganism is reduced. 
Nevertheless, molecular oxygen must be supplied for growth, since the assimilation of the 
substrate and corresponding growth of the microorganisms, is, in part, a combustion 
process. 

(1261 Although the aeration rate can vary over a considerable range, aeration generally is 
conducted at a rate which is in the range of about 0.5 to 10, preferably about 0.5 to 7, 
volumes (at the pressure employed and at 25°C.) of oxygen-containing gas per liquid volume 
in the fermentor per minute. This amount is based on air of normal oxygen content being 
supplied to the reactor, and in terms of pure oxygen the respective ranges would be about 
0.1 to 1 .7, or preferably about 0.1 to 1 .3, volumes (at the pressure employed and at 25°C.) of 
oxygen per liquid volume in the fermentor per minute. 

|127l The pressure employed for the microbial conversion process can range widely. 
Pressures generally are within the range of about 0 to 50 psig, presently preferably about 0 
to 30 psig. more preferably at least slightly over atmospheric pressure, as a balance of 
equipment and operating cost versus oxygen solubility achieved. Greater than atmospheric 
pressures are advantageous in that such pressures do tend to increase a dissolved oxygen 
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concentration in the aqueous ferment, which in turn can help increase cellular growth rates. 
At the same time this is balanced by the fact that high atmospheric pressures do increase 
equipment and operating costs. 

[128] The fermentation temperature can vary somewhat, but for f ilamentous fungi such as 
Aspergillus niger var. awamori the temperature generally will be within the range of about 
20°C to 40°C, generally preferably in the range of about 28°C to 37°C, depending on the 
strain of microorganism chosen. 

[129] The microorganisms also require a source of assimilable nitrogen. The source of 
assimilable nitrogen can be any nitrogen-containing compound or compounds capable of 
releasing nitrogen in a form suitable for metabolic utilization by the microorganism. While a 
variety of organic nitrogen source compounds, such as protein hydrolysates, can be 
employed, usually cheap nitrogen-containing compounds such as ammonia, ammonium 
hydroxide, urea, and various ammonium salts such as ammonium phosphate, ammonium 
sulfate, ammonium pyrophosphate, ammonium chloride, or various other ammonium 
compounds can be utilized. Ammonia gas itself is convenient for large scale operations, and 
can be employed by bubbling through the aqueous ferment (fermentation medium) in 
suitable amounts. At the same time, such ammonia can also be employed to assist in pH 
control. 

[130] The pH range in the aqueous microbial ferment (fermentation admixture) should be in 
the exemplary range of about 2.0 to 8.0. With filamentous fungi, the pH normally is within 
the range of about 2.5 to 8.0; with Aspergillus niger van awamori, the pH normally is within 
the range of about 4.5 to 5.5. pH range preferences for certain microorganisms are 
dependent on the media employed to some extent, as well as the particular microorganism, 
and thus change somewhat with change in media as can be readily determined by those 
skilled in the art. 

[131] While the average retention time of the fermentation admixture in the fermentor can 
vary considerably, depending in part on the fermentation temperature and culture employed, 
generally it will be within the range of about 24 to 500 hours, preferably presently about 24 to 
400 hours. 

[132] Preferably, the fermentation is conducted in such a manner that the carbon- 
containing substrate can be controlled as a limiting factor, thereby providing good conversion 
of the carbon-containing substrate to cells and avoiding contamination of the cells with a 
substantial amount of unconverted substrate. The latter is not a problem with water-soluble 
substrates, since any remaining traces are readily washed off. It may be a problem, 
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however, in the case of non-water-soluble substrates, and require added product-treatment 
steps such as suitable washing steps. 

[1331 As described above, the time to reach this limiting substrate level is not critical and 
may vary with the particular microorganism and fermentation process being conducted. 
However, it is well known in the art how to determine the carbon source concentration in the 
fermentation medium and whether or not the desired level of carbon source has been 
achieved. 

[134] Although the fermentation can be conducted as a batch or continuous operation, fed 
batch operation is generally preferred for ease of control, production of uniform quantities of 
products, and most economical uses of all equipment. 

[135] If desired, part or all of the carbon and energy source material and/or part of the 
assimilable nitrogen source such as ammonia can be added to the aqueous mineral medium 
prior to feeding the aqueous mineral medium to the fermentor. 
J136] Each of the streams introduced into the reactor preferably is controlled at a 
predetermined rate, or in response to a need determinable by monitoring such as 
concentration of the carbon and energy substrate, pH, dissolved oxygen, oxygen or carbon 
dioxide in the off-gases from the fermentor, cell density measurable by light transmittancy, or 
the like. The feed rates of the various materials can be varied so as to obtain as rapid a cell 
growth rate as possible, consistent with efficient utilization of the carbon and energy source, 
to obtain as high a yield of microorganism cells relative to substrate charge as possible, but 
more importantly to obtain the highest production of the desired protein per unit volume. 
[137] In either a batch, or the preferred fed batch operation, all equipment, reactor, or 
fermentation means, vessel or container, piping, attendant circulating or cooling devices, and 
the like, are initially sterilized, usually by employing steam such as at about 1 21°C for at 
least about 1 5 minutes. The sterilized reactor then is inoculated with a culture of the 
selected microorganism in the presence of all the required nutrients, including oxygen, and 
the carbon-containing substrate. The type of fermentor employed is not critical, though 
presently preferred is operation under 15L Biolafitte (Saint-Germain-en-Laye, France). 
PROTEIN SEPARATIONS 

[138] Once the desired protein is expressed and, optionally, secreted recovery of the 
desired protein may be necessary. The present invention provides methods of separating a 
desired protein from its fusion analog. It is specifically contemplated that the methods 
described herein are useful for the separation of proteinase inhibitor and variants from the 
fusion analog. 
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[139] The collection and purification of the desired protein from the fermentation broth can 
also be done by procedures known per se in the art. The fermentation broth will generally 
contain cellular debris, including cells, various suspended solids and other biomass 
contaminants, as well as the desired protein product, which are preferably removed from the 
fermentation broth by means known in the art. 

[1401 Suitable processes for such removal include conventional solid-liquid separation 
techniques such as, e.g., centrifugation, filtration, dialysis, microfiltration, rotary vacuum 
filtration, or other known processes, to produce a cell-free filtrate. It may be preferable to 
further concentrate the fermentation broth or the cell-free filtrate prior to crystallization using 
techniques such as ultrafiltration, evaporation or precipitation. 

[1411 Precipitating the proteinaceous components of the supernatant or filtrate may be 
accomplished by means of a salt, e.g., ammonium sulfate or adjust pH to 2 to 3 and then 
heat treatment of the broth at 80°C for 2 hours, followed by purification by a variety of 
chromatographic procedures, e.g., ion exchange chromatography, affinity chromatography 
or similar art recognized procedures. 

[142] When the expressed desired polypeptide is secreted the polypeptide may be purified 
from the growth media. Preferably the expression host cells are removed from the media 
before purification of the polypeptide (e.g. by centrifugation). 

[143] When the expressed recombinant desired polypeptide is not secreted from the host 
cell, the host cell is preferably disrupted and the polypeptide released into an aqueous 
"extract" which is the first stage of purification. Preferably the expression host cells are 
collected from the media before the cell disruption (e.g. by centrifugation). 
[144] The cell disruption may be performed by conventional techniques such as by 
lysozyme or beta-glucanase digestion or by forcing the cells through high pressure. See 
(Robert K. Scobes, Protein Purification, Second edition, Springer-Verlag) for further 
description of such cell disruption techniques. 

[1451 The addition of six histidine residues, i.e., a His Tag, to the C-terminus may also aid 
in the purification of the desired protein and its fusion analog. Use of the His tag as a 
purification aid is well known in the art. See, for example, Hengen (1995) TIBS 20(7):285- 
286. The 6x his-tagged proteins are easily purified using Immobilized Metal ion Affinity 
Chromatography (IMAC). 

[1461 It is specifically contemplated that protease inhibitors and variants thereof may be 
purified from an aqueous protein solution, e.g., whole cell fermentation broth or clarified 
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broth, using a combination of hydrophobic charge induction chromatography (HCIC). HCIC 
provided an ability to separate the desired protein from the broth and from its fusion analog. 
UTILITY 

[147] For some applications of desired proteins it is of high importance that the protease 
inhibitors are extremely pure, e.g. having a purity of more than 99%. This is particularly true 
whenever the desired protein is to be used as a therapeutic, but is also necessary for other 
applications. The methods described herein provide a way of producing substantially pure 
desired proteins. The desired proteins described herein are useful in pharmaceutical and 
personal care compositions. 

[148] In the experimental disclosure which follows, the following abbreviations apply: eq 
(equivalents); M (Molar); pM (micromolar); N (Normal); mol (moles); mmol (millimoles); pmol 
(micromoles); nmol (nanomoles); g (grams); mg (milligrams); kg (kilograms); pg 
(micrograms); L (liters); ml (milliliters); pi (microliters); cm (centimeters); mm (millimeters); 
pm (micrometers); nm (nanometers); ° C. (degrees Centigrade); h (hours); min (minutes); 
sec (seconds); msec (milliseconds); Ci (Curies) mCi (milliCuries); pCi (microCuries); TLC 
(thin layer achromatography); Ts (tosyl); Bn (benzyl); Ph (phenyl); Ms (mesyl); Et (ethyl), Me 
(methyl). PI (proteinase inhibitor), BBI (Bowman-Birk inhibitor), STI (Soybean Trypsin 
inhibitor). 

EXAMPLES 

[149] The present invention is described in further detain in the following examples which 
are not in any way intended to limit the scope of the invention as claimed. The attached 
Figures are meant to be considered as integral parts of the specification and description of 
the invention. All references cited are herein specifically incorporated by reference for all 
that is described therein. The following examples are offered to illustrate, but not to limit the 
claimed invention. 

Example 1 

Cloning of DNA encoding the Soybean Trypsin Inhibitor 
[150] This example illustrates the development of an expression vector for STI. 
[151] In general, the gene encoding the desired protein was fused to the DNA encoding the 
linker region of glucoamylase with an engineered kexB cleavage site (NVISKR) via an Nhe\ 
restriction enzyme site at the N-terminal and a BsfEII restriction enzyme site at the C- 
terminal following the STI stop codon, TAG. The gene encoding the soybean STI was 
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synthesized by MCLAB (South San Francisco, California) in vitro as a DNA fragment 
containing two restriction sites, a kexB cleavage site and three glycine residues at N-terminal 
end and six histidine residues at C-terminal end. (SEQ ID NO:3, gene shown in Figure 2). 
All PCR-generated DNA fragments used herein were initially cloned into the pCRII-TOPO 
vector (Invitrogen, Carlsbad, CA). E. coli [One Shot® TOP10 cells from Invitrogen], was 
used for routine plasmid isolation and plasmid maintenance. The Nhe\ and BstEU sites were 
used to excise the PCR product from the pCRII-TOPO vector, and the resulting DNA 
fragment was then ligated into the expression vector, pSL1180-GAMpR-2 (see Figure 5) The 
expression vector, pSL1180-GAMpR2, contains the Aspergillus niger glucoamylase 
promoter, the glucoamylase catalytic domain and the terminator region. The expression 
plasmid also contains the A. niger pyrG gene as the selection marker. Thus, detection of 
transformants with the expression cassette is by growth on uridine-deficient medium. 
[152] The gene encoding the STI peptide (for amino acid sequence: Figure 4A, SEQ ID 
NO:10; for nucleotide sequence: Figure 2 and SEQ ID NO:6) was synthesized and cloned 
into pCRII-TOPO vector (Invitrogen) by MCLAB. The Nhe\ to BstEH fragment was release 
from the plasmid by restriction digestion and the DNA fragment was extracted from an 
agarose gel and cloned into pSLGAMpR2, a glucoamylase- chymosin expression vector 
which is described in detail in WO 9831821 . to create expression plasmid pSLGAMpR2- 
SBTI/nonopti (Q110). 

|153] The expression plasmid was transformed into dgr246AGAP:pyr2-. This strain is 
derived from strain dgr246 P2 which has the pepA gene deleted, is pyrG minus and has 
undergone several rounds of mutagenesis and screening or selection for improved 
production of a heterologous gene product (Ward, M. et al., 1993, Appl. Microbiol. Biotech. 
39:738-743 and references therein). To create strain dgr246AGAP:pyr2- the glaA 
(glucoamylase) gene was deleted in strain dgr246 P2 using exactly the same deletion 
plasmid (pAGAM NB-Pyr) and procedure as reported by Fowler, T. et al (1990) Curr. Genet. 
18:537-545. Briefly, the deletion was achieved by transformation with a linear DNA fragment 
having glaA flanking sequences at either end and with part of the promoter and coding 
region of the glaA gene replaced by the Aspergillus nidulans pyrG gene as selectable 
marker. Transformants in which the linear fragment containing the glaA flanking sequences 
and the pyrG gene had integrated at the chromosomal glaA locus were identified by 
Southern blot analysis. This change had occurred in transformed strain dgr246AGAP. 
Spores from this transformant were plated onto medium containing fluoroorotic acid and 
spontaneous resistant mutants were obtained as described by van Hartingsveldt, W. et al. 
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(1987) Mol. Gen. Genet. 206:71-75. One of these. dgr246AGAP:pyr2-, was shown to be a 
uridine auxotroph strain which could be complemented by transformation with plasmids 
bearing a wild-type pyrG gene. 

[1541 The Aspergillus transformation protocol was a modification of the Campbell method 
(Campbell et at. (1989). Curr. Genet. 16:53-56). All solutions and media were either 
autoclaved or filter sterilized through a 0.2 micron filter. Spores of A. niger var. awamori 
were harvested from complex media agar (CMA) plates. CMA contained 20 g/l dextrose, 20 
g/l Difco Brand malt extract, 1 g/l Bacto Peptone, 20 g/l Bacto agar, 20 ml/l of 100 mg/ml 
arginine and 20 ml/l of 100 mg/ml uridine. An agar plug of approximately 1.5 cm square of 
spores was used to inoculate 100 mis of liquid CMA (recipe as for CMA except that the 
Bacto agar was omitted). The flask was incubated at 37°C on a shaker at 250-275 rpm, 
overnight. The mycelia were harvested through sterile Miracloth (Calbiochem, San Diego, 
CA, USA) and washed with 50 mis of Solution A (0.8M MgS0 4 in 10 mM sodium phosphate, 
pH 5.8). The washed mycelia were placed in a sterile solution of 300 mg of beta-D- 
glucanase (Interspex Products, San Mateo, CA) in 20 mis of solution A. This was incubated 
at 28° to 30°C at 200 rpm for 2 hour in a sterile 250 ml plastic bottle (Corning Inc, Corning, 
New York). After incubation, this protoplasting solution was filtered through sterile Miracloth 
into a sterile 50 ml conical tube (Sarstedt, USA). The resulting liquid containing protoplasts 
was divided equally amongst two 50 ml conical tubes. Forty ml of solution B (1 .2 M sorbitol, 
50 mM CaCI 2 , 10 mM Tris, pH7.5) were added to each tube and centrifuged in a table top 
clinical centrifuge (Damon IEC HN Sll centrifuge) at full speed for 5 minutes. The 
supernatant from each tube was discarded and 20 mis of fresh solution B was added to one 
tube, mixed, then poured into the next tube until all the pellets were resuspended. The tube 
was then centrifuged for 5 minutes. The supernatant was discarded, 20 mis of fresh solution 
B was added, the tube was centrifuged for 5 minutes. The wash occurred one last time 
before resuspending the washed protoplasts in solution B at a density of 0.5-1 .0 X 10 7 
protoplasts/IOOul. To each 100 ul of protoplasts in a sterile 15 ml conical tube (Sarstedt, 
USA), 10 ul of the transforming plasmid DNA was added. To this, 12.5 ul of solution C (50% 
PEG 4000, 50 mM CaCI 2> 10 mM Tris, pH 7.5) was added and the tube was placed on ice 
for 20 minutes. One ml of solution C was added and the tube was removed from the ice to 
room temperature and shaken gently. Two ml of solution B was added immediately to dilute 
solution C. The transforming mix was added equally to 3 tubes of melted MMS overlay (6 g/l 
NaN0 3 , 0.52 g/l KCI, 1.52 g/l KH 2 PO<, 218.5 g/l D-sorbitol, 1.0 ml/l trace elements-LW, 10 g/l 
SeaPlaque agarose (FMC Byproducts, Rookland, Maine, USA) 20 ml/l 50% glucose, 2.5 
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ml/I 20% MgSCv7H 2 0. pH to 6.5 with NaOH) that were stored in a 45°C water bath. Trace 
elements-LW consisted of 1 g/l FeS0 4 .7H 2 0, 8.8 g/l ZnS0 4 .7H 2 0, 0.4 g/l CuS0 4 .5H 2 0, 0.15 
g/l MnS0 4 .4H 2 0, 0.1 g Na 2 B 4 O 7 .10H 2 O, 50 mg/l (NH 4 ) 6 Mo 7 0 24 .4H 2 0, 250 mis H20, 200 ul/l 
concentrated HCI. The melted overlays with the transformation mix were immediately 
poured onto 3 MMS plates (same as MMS overlay recipe with the exception of 20 g/l of 
Bacto agar instead of 10 g/l of SeaPlaque agarose) that had been supplemented with 333 
ul/plate of 100 mg/ml of arginine added directly on top of the agar plate. After the agarose 
solidified, the plates were incubated at 30°C until transformants grew. 
[155] The sporulating transformants were picked off with a sterile toothpick onto a plate of 
minimal media + glucose (MM). MM consisted of 6 g/l NaN0 3 , 0.52 g/l KC1 , 1 .52 g/l 
KH 2 P0 4> 1 ml/l Trace elements-LW, 20 g/l Bacto agar, pH to 6.5 with NaOH, 25 ml/l of 40 % 
glucose,' 2.5 ml/l of 20% MgS0 4 .7H 2 0 and 20 ml/l of 100 mg/ml arginine. Once the 
transformants grew on MM they were transferred to CMA plates. 

[1561 A 1 .5 cm square agar plug from a plate culture of each transformant was added to 50 . 
mis, in a 250 ml shake flask, of production medium called Promosoy special. This medium 
had the following components: 70 g/l sodium citrate, 15 g/l (NH 4 ), S0 4 , 1 g/l NaH 2 P0 4 .H 2 0, 1 
g/l MgS0 4 , 1 ml Tween 80, pH to 6.2 with NaOH, 2 ml/l Mazu DF60-P, 45 g/l Promosoy 100 
(Central Soya, Fort Wayne, IN), 120 g/l maltose. The production media flasks were 
incubated at 30°C, 200 rpm for 5 days and supernatant samples were harvested. 
Transformants were assayed for protein production on SDS gel to select the transformants 
based on the amount of protein produced. Broth from the top transformants were assayed 
for Trypsin or chymotrypsin inhibition activity 

[1571 A 1 .5 cm square agar plug from a plate culture of each transformant was also added 
to 50 mis in a 250 ml shake flask, of production medium called modified CSS. This medium 
had the following components: 50g/l Corn Streep Solids, 1g/l NaH2P04*H20, 0.5g/l MgS04 
(anhydrous), 50g/l Staley 7350 (55%) and 8g/l Na Citrate. The production media flasks were 
incubated at 36°C, 200 rpm for 3 days and supernatant samples were harvested and 
assayed for protein production on SDS gel. Broth from the top transformants were assayed 
for Trypsin or Chymotrypsin inhibition activity. 

Example 2 

Codon optimization of the DNA encoding the Soybean Trypsin Inhibitor 
|1581 The following example details how the STI-encoding DNA was altered for optimized 
expression in a filamentous fungi. 
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[1591 The codons from the synthetic gene (the starting material in Example 1 that was 
synthesized by MCLAB) were then optimized according to the codon usage of highly 
expressed proteins in Aspergillus. Basically, proteins that expressed well such as 
glucoamylase, alpha-amylase and prochymosin were compared to proteins that did not 
express well in Aspergillus such as human NEP and DPP4. See Table I. The codon usage 
table for both types of protein expressions in Aspergillus is in Table II. 



Table IT 



o 
o 

a. 

LU 



Q- 
CL 
Q 



LU 

o 
o 

CO 



o 
E 

o 

I 

<D 
C 

I 



c 

o 
E 



« 

□ 
<1> 



CO 

Q> 
(A 
(0 

■o 

'k 
o 
</> 

■4-1 

o 

.c 

u 

2 

CO 



0) 

</> 

(0 

■g 

'x 
o 

o 

.c 

o 

a 



o re 
_re 

!i 

O J2 

O f- 



uoaon f- 
qca* / 


AM 

\la(A) 1 


19 


15 




1 


0 


1 


1 


2 


4 


10 


gcc > 


\la(A) 


12 


6 




8 


15 


17 


8 


19 


23 


18 


a^g > 


\la(A)_ 


1 


2 




0 


1 


1 


2 


1 


0 


9 


gcu / 


Ma(A) 


18 


12 




2 


1 


3 


3 


25 


21 


20 




Ma(A) 


50 


35 




11 


17 


22 


14 


47 


48 


57 


aga * * 


*rg(R) 


15 


16 




3 


1 


0 


2 


2 


2 


1 


agg 


Arg(R) 


5 


8 




4 


6 


4 


1 


4 


5 


1 


cga 


Arg(R) 


6 


1 




0 


1 


0 


0 


7 


3 


3 


cgc 


Arg(R) 


2 


1 




4 


1 


1 


2 


12 


11 


6 


egg 


Arg(R) 


1 


3 




0 


0 


0 


0 


0 


0 


2 


cgu 


Arg(R) 


4 


1 




0 


0 


1 


4 


7 


8 


4 




Arg(R) 


33 


30 




11 


9 


6 


9 


32 


29 


17 


aac 


Asn(N) 


20 


16 




1 


11 


6 


5 


26 


27 


17 


aau * 


Asn(N) 


36 


23 




9 


4 


3 


1 


5 


6 


6 




Asn(N) 


56 


39 




10 


15 


9 


6 


31 


33 


23 


gac 


Asp(D) 


13 


16 




10 


20 


14 


5 


24 


20 


17 


gau * 


Asp(D) 


28 


27 




4 


2 


1 


5 


18 


19 


18 




Asp(D) 


41 


43 




14 


22 


15 


10 


42 


39 


35 


ugc 


Cvs(C) 


7 


4 




10 


3 


8 


4 


1 


0 


6 


ugu 


Cys(C) 


5 


8 




2 


3 


2 


1 


0 


1 


2 




Cvs(C) 


12 


12 




12 


e 


10 


5 


1 


1 


8 


caa * 


Gln(Q) 


^A 


1E 




2 


1 


1 


2 


\ 2 


e 


4 


cag 


Gln(Q) 


17 


ie 




7 


24 


I 6 


12 


! 14 


^ 12 


\ 11 




Gln(Q) 


31 


3C 


) 


1C 


) 25 


5 £ 


) 1£ 


> u 


> 1S 


) 15 


gaa * 


Glu(E) 


36 


5 31 






i : 


2 1 


1 




> ^ 


[ 7 


gag 


Glu(E) 


Yi 


7 < 


) 




i is 


I i 






$ 4: 


> 11 




Glu(E) 


s: 


I 4( 






3 V 


\ i 


> * 


} 4( 


) 4( 


5 18 



33 



ATTORNEY DOCKET NO. GC815P 
PROVISIONAL PATENT APPLICATION 



O) 

c 



u 
LU 



CL 
CL 
Q 



111 

o 
o 

<0 



c 

o 

E 
>% 
_c 
o 
o 



© 
c 



o 

-Q 



c 
"55 

Q. 

*-< 

o 
E 



c 
"re 



CM 
0) 



CD 

(0 

re 
"D 

*K 
O 



< 

o 

re 
■g 

'x 
o 



o 

a 



o 

O 

2 



o re 

11 

© T3 

re O) 

l! 

H 

to 



gga* < 


3ly(G) 


15 


20 




5 


1 


1 




lU 


I u 


c 
O 


ggc < 


3ly(G) 


12 


7 




11 


15 


14 


0 


1 f 


on 


9n 
zu 


999 1 


3ly(G) 


7 


6 




1 


12 


7 


U 


U 


i 


*f 


ggu < 


3ly(G) 


7 


7 




4 


3 


1 


4 


4 "7 
1 f 


1 1 


1 9 
I z 


— 


3ly(G) 


41 


40 




21 


31 


23 


1 1 


A A 

44 


4Z 


A 1 
4 I 


cac 


His(H) 


5 


8 




5 


4 


3 




1 -1 

1 1 


1 O 


A 
*• 


cau 


His(H) 


4 


11 




2 


2 


0 


1 


o 

z 


A 

4 


n 
u 


... 


His(H) 


9 


19 




7 


6 


3 


o 


1 o 






aua 


le(l) 


9 


13 




1 


1 


1 


0 


0 


U 


u 


auc 


led) 


10 


12 




3 


19 


9 


5 


iy 


1 o 


o 


auu * 


lle(l) 


26 


22 




2 


2 


2 


1 


o 


1 z. 


i i 


— 


lle(l) 


45 


47 




6 


22 


12 


/> 

D 


oc 

ZD 


ZC 


1 Q 


cua * 


Leu(L) 


5 


6 




0 


0 


0 


0 


r\ 
U 


U 


o 


cue 


Leu(L) 


7 


8 




8 


5 


4 


3 


1 O 


lo 


1 ft 
I D 


cug 


Leu(L) 


9 


15 




10 


23 


15 


10 


14 


To 


I O 


cuu 


Leu(L) 


15 


7 




0 


1 


0 


0 


1 1 


JZ 


o 
O 


uua * 


Leu(L) 


7 


10 




1 


0 


0 


0 


0 


U 


U 


uug * 


Leu(L) 


16 


9 




1 


0 


0 


1 


5 


*x 
o 


D 


— 


Leu(L) 


59 


55 




20 


* 29 


19 


14 


43 


4y 




aaa * 


Lys(K) 


32 


27 




3 


6 


1 


5 


0 


0 


U 


aag 


Lys(K) 


I f 


\ u 






q 


14 


g 


7 


19 


11 




Lys(K) 


49 


37 




16 


15 


15 


14 


7 


19 


11 


aug 


Met(M) 


14 


14 




6 


8 


2 


1 


17 


12 


3 




Met(M) 


14 


14 




6 


8 


2 


1 


17 


12 


3 


uuc 


Phe(F) 


12 


14 




3 


13 


6 


9 


24 


21 


17 


uuu * 


Phe(F) 


16 


17 




1 


6 


1 


0 


3 


6 


2 




Phe(F) 


28 


31 




4 


19 


7 


9 


27 


27 


19 


cca * 


Pro(P) 


9 


14 




4 


1 


0 


2 


1 


6 


0 


ccc 


Pro(P) 


6 


2 




7 


11 


8 


6 


20 


17 


8 


ccg 


Pro(P) 


C 


1 




1 


3 


0 


2 


4 


2 


6 


ecu 


Pro(P) 


7 


1C 




2 


1 


6 




21 


16 


3 




Pro(P) 


22 


27 




14 


ie 


> 12 


12 


46 


41 


17 


age 


Ser(S) 


7 


12 






! 1c 


; e 


> 14 


\ 8 


7 


19 


agu * 


Ser(S) 


7 


14 


\ 




! A 


L 1 


1 


2 


! 1 


10 


uca * 


Ser(S) 


11 


17 


T 


2 


\ 1 


c 


) C 


) 2 


! 2 


! 2 


ucc 


Ser(S) 




1C 


) 


c 


) S 


) \A 


\ 11 


11 


I 


J 16 


ucg 


Ser(S) 


C 


) 




1 




\ ( 


) 2 


> 4 


\ : 


J 11 


ucu 


Ser(S) 


11 


1C 


) 


1 




\ 2 


2 E 


3 5 


) t 


J 15 



34 



ATTORNEY DOCKET NO. GC815P 
PROVISIONAL PATENT APPLICATION 







NEP coding 


DPP4 


SCCE 


>vine prochymosin 


chymotrypsin 


Her2 Light chain 


lybotrys oxidas B 


lybotrys oxidase A 


glucoamylas w/out 
- starch binding domain 


Codnn AA 








«-> 
.q 






Start 


start 




3er{S) 


43 


64 




18 


35 


23 


33 


36 


29 


73 


uaa 


rer(.) 


0 


0 




1 


0 


0 


0 


0 


1 


0 


uag 


Ter(.) 


0 


1 




0 


0 


0 


0 


0 


0 


0 


uga 


Ter(.) 


1 


0 




0 


0 


1 


1 


0 


0 


0 




Ter(.) 


1 


1 




1 


0 


1 


1 


0 


A 
1 


u 


aca * 


Thr(T) 


10 


21 




1 


5 


4 


2 


2 


1 


4 


acc 


Thr(T) 


9 


6 




9 


13 


1o 


14 


12 


16 


28 


acg 


Thr(T) 


1 


1 




3 


2 


0 


2 


1 


1 


8 


acu 


Thr(T) 


10 


'17 




5 


4 


0 


3 


16 


12 


14 




Thr(T) 


30 


45 




18 


24 


17 


21 


31 


30 


54 


ugg 


Trp(W) 


14 


20 




5 


4 


8 


2 


11 


14 


15 




Trp(W) 


14 


20 




5 


4 


8 


2 


11 


14 


15 


uac 


Tyr(Y) 


11 


26 




4 


17 


2 


9 


21 


24 


16 


uau * 


Tvr(Y) 


22 


30 




0 


5 


0 


2 


4 


4 


6 




Tyr(Y) 


33 


56 




4 


22 


2 


11 


25 


28 


22 


gua * 


Val(V) 


5 


7 




0 


2 


0 


1 


0 


2 


2 


guc 


Val(V) 


9 


12 




6 


7 


10 


10 


17 


25 


13 




Val(V) 


13 


14 




10 


14 


14 


4 


8 


6 


14 


guu 


Val(V) 


10 


11 




2 


3 


C 


1 


21 


12 


5 




Val(V) 


37 


44 


r 


1€ 


26 


24 


16 


46 


45 


34 


nnn 


???(X) 


C 


) C 


) 


C 


) C 


) C 


) C 


I 3 


C 


) 0 


TOT 


AL 


[ 701 


725 


) 


232 


! 366 


> 24e 


5 222 


! 582 


t 597 


527 



[160] It is evident that many codons were not used or not used as often in the genes that 
expressed well. These codons were found much more frequently in those genes that were 
not expressed well (indicated with an asterisk in Table II). In the STI gene, we identified 
several such codons that were not used or not used often by other well expressed proteins 
and the codons were changed to the codons that are used more often in well expressed 
proteins. See Tables III and IV. 
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Codon usage for wild type STI 
the stop codon) 
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TABLE III 
: (without three glycine residues 
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Table IV 

Codon usage for A. niger codon optimized STI I: (without three glycine residues and six 
histidine residues and the stop codon): 
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[1611 The optimized DNA was synthesized by MCLAB (South San Francisco) in vitro as a 
DNA fragment containing three restriction sites (/Mel at 5' end of gene and Xho\ and BstEU 
at the 3' end), a kexB cleaveage site and three glycine residues at N-terminal end and six 
histidine residues at C-terminal (SEQ I.D. NO:3). This optimized gene was cloned into a 
pCRII-TOPO vector. Following the procedures described in Example 1 above, the Nhe\ to 
BsfEII fragment was released from the plasmid by restriction digestion and the DNA 
fragment was purified on and extracted from an agarose gel and cloned into pSLGAMpR2 to 
create expression plasmid pSLGAMpR2-SBTI (Q107). 

[1621 The expression plasmid was transformed into dgr246AGAP:pyr2. The transformation 
and shake flask testing of transformants were as in Example 1 . Thirty one transformants 
were assayed and SDS gel was used to check the level of protein expression. Broth from 
the top six transformants were assayed for trypsin inhibition activity. 

Example 3 

Expression of the Bowman-Birk Inhibitor and its Variants in Aspergillus 

a. BBI fusion to glucoamylase with kexB site and with three glycine at N-terminal end and six 
histidine residues at C-terminal: 

[163] Following procedures described in Example 2 above, the BBI-encoding DNA was 
optimized and used for this Example. The DNA was synthesized by MCLAB in vitro as a 
DNA fragment containing three restriction sites {Nhe\ at 5' end of gene and Xho\ and BsfEII 
at the 3' end), a kexB cleavage site and three glycine residues at N-terminal and six histidine 
residues at C-terminal. (SEQ ID No:76). It was cloned into pCRII-TOPO vector Invitrogen. 
Following procedures described in Example 1 above, the Nhe\ to BsfEII fragment was 
released from the plasmid by restriction digestion and the DNA fragment was extracted from 
agarose gel and cloned into pSLGAMpR2 to create expression plasmid pSLGAMpR2- 
BBIkex+ (Q104). The expression plasmid was transformed into dgr246AGAP:pyr2. The 
transformation and shake flask testing of transformants were same as Example 1 . Twenty- 
eight transformants were generated and twenty-five transformants were assayed in shake 
flask. The SDS gel was used to check the level of protein expression. Broth from the top 
transformants were assayed for trypsin or chymotrypsin inhibition activity. 

b. BBI fusion to glucoamylase with six histidine residues at C-terminal: 
1164] Following procedures described in Example 2 above, the BBI-encoding DNA was 
optimized and used for this Example. The DNA was synthesized by MCLAB in vitro as a 
DNA fragment containing three restriction sites (Nhe\ at 5" end of gene and Xhol and BsfEII 
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at the 3* end) and six histidine residues at C-terminal. (SEQ ID NO:42: 
GCTAGCGACGATGAGAGCTCTAAGCCCTGTTGCGATCAGTGCGCGTGTACCAAATCGA 

ACCCTCCGCAGTGTCGCTGCTCCGATATGCGTCTGAATTCCTGTCATAGCGCATGCAA 
GAGCTGTATCTGCGCCCTGAGCTACCCCGCGCAGTGTTTCTGCGTCGACATCACGGAC 
TTCTGCTACGAGCCGTGTAAGCCCAGCGAGGACGATAAGGAGAACCATCATCACCATC 
ACCATTAGCTCGAGGGTGACC). It was cloned into pCRII-TOPO vector. Following 
procedures described in Example 1 above, the Nhe\ to SsfEII fragment was release from the 
plasmid by restriction digestion, purified and extracted from agarose gel, and cloned into 
pSLGAMpR2 to create expression plasmid pSLGAMpR2-BBIkex-(Q105). The expression 
plasmid was transformed into dgr246AGAP:pyr2. The transformation and shake flask testing 
of transformants were same as example 1 . Thirty-eight transformants were generated and 
twenty-five transformants were assayed in shake flask. The SDS gel was used to check the 
level of protein expression. Broth from the top transformants were assayed for trypsin or 
chymotrypsin inhibition activity. 

c. BBI fusion to glucoamylase with kexB site and three glycine residues at N-terminal end: 
[165] The plasmid DNA, synthesized by MCLAB in vitro (SEQ ID NO:5) which was cloned 
into pCRII-TOPO vector, was used as DNA template for PCR amplification. Two primers 
were designed: 5' GGG CTA GCA ACG TCA TCT CCA AG 3' (SEQ ID NO:43) and 5' GGG 
GTC ACC TAG TTC TCC TTA TCG TCC TCG CTG 3' (SEQ ID NO:44). The DNA was 
amplified in the presence of the primers under the following conditions: The DNA was diluted 
10 to 100 fold with Tris-EDTA buffer. Ten microliter of diluted DNA was added to the reaction 
mixture which contained 0.2 mM of each nucleotide (A, G. C and T), 1x reaction buffer, 0.5 
to 0.6 microgram of primer 1 (SEQ ID NO:43) and primer 2 (SEQ ID NO:44) in a total of 100 
microliter reaction in an eppendorf tube. After heating the mixture at 100°C for 5 minutes, 
2.5 units of Taq DNA polymerase were added to the reaction mix. The PCR reaction was 
performed at 95°C for 1 minute, the primer was annealed to the template at 50°C for 1 
minute and extension was done at 72°C for 1 minute. This cycle was repeated 30 times with 
an additional cycle of extension at 68°C for 7 minutes before stored at 4°C for further use. 
The PCR fragment detected by agarose gel was then cloned into the plasmid vector pCRII- 
TOPO (Invitrogen). The resulting PCR fragment contains identical sequence as SEQ ID 
NO:76, except the nucleotides encoding the six histidine residues and the Xho\ restriction 
site were removed. Following procedures described in Example 1 above, the PCR fragment 
was digested with restriction enzymes Nhe\ and BsfEII. The digested DNA fragment was 
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precipitated by ethanol and cloned into pSLGAMpR2 to create expression plasmid 
pSLGAMpR2-BB! without histag (Q108). The expression plasmid was transformed into 
dgr246AGAP:pyr2. The transformation and shake flask testing of transformants were same 
as described in Example 1. Fifty-seven transformants were generated and twenty-five 
transformants were assayed in shake flask. The SDS gel was used to check the level of 
protein expression. Broth from the top transformants were assayed for trypsin or 
chymotrypsin inhibition activity. 

d. BBI fusion to glucoamylase with kexB site: 

[166] The plasmid DNA, synthesized by MCLAB in vitro (SEQ ID NO:1 ) which was cloned 
into pCRII-TOPO vector, was used as DNA template for PCR amplification. Two primers 
were designed: 5' GGG GTC ACC TAG TTC TCC TTA TCG TCC TCG CTG 3' (SEQ ID 
NO:44) and 5' GGG CTA GCA ACG TCA TCT CCA AGC GCG ACG ATG AGA GCT CTA 
AG 3' (SEQ ID NO:45). The resulting PCR fragment contains identical sequence as SEQ 
ID NO:76 (Figure 1C), except the nucleotides encoding the three glycine residues and six 
histidine residues and Xho\ restriction site were removed. Following procedures described 
in Example 1 above, the PCR fragment was digested with restriction enzymes Nhe\ and 
BstEW. The digested DNA fragment was precipitated by ethanol and cloned into 
pSLGAMpR2 to create expression plasmid pSLGAMpR2-BBl without 3G and histag (Q109). 
The expression plasmid was transformed into dgr246AGAP:pyr2. The transformation and 
shake flask testing of transformants were same as Example 1 . One hundred and twenty- 
seven transformants were generated and forty-two transformants were assayed in shake 
flask. The SDS gel was used to check the level of protein expression. Broth from the top 
transformants were assayed for trypsin or chymostrypsin inhibition activity. 

Example 4 

Expression of the Bowman-Birk Inhibitor and its Variants (loop replacement by other 

binders) in Aspergillus 

[167] Variant sequences were introduced into one or both loops of BBI using standard 

procedures known in the art. Variant sequences were determined by panning a 

commercially available phage peptide library PhD C7C (New England Biolabs, Beverly, MA) 

against target proteins or substrates for 3 rounds according to the manufacturers 

instructions, or using sequences with known activity. In the sequences provided below, the 

alterations introduced into the loop nucleotide sequence is indicated by lower case 

nucleotides. 
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a. BBI with a-VEGF (CK37281) in loop I 

[168] The plasmid DNA, synthesized by MCLAB in vitro (SEQ ID NO:1 ) which was cloned 
into pCRII-TOPO vector, was used as DNA template for PCR amplification. Two primers 
were designed: 

5' GTTGCGATCAGTGCGCGTGTtacaatctgtatggctggaccTGTCGCTGCT 3' (SEQ ID 
NO:46) and 

5' CGCATATCGGAGCAGCGACAggtccagccatacagattgtaACACGCGCAC 3". (SEQ ID 
NO:47) 

to introduce a peptide sequence that binds to VEGF (denoted a-VEGF) to inhibit VEGF 
function. PCR was performed by heating mixture at 94°C for 2 min, then 30 cycles of 
reaction at 94°C for 30 second, 63°C for 30 second and 72°C for 30 second. After 30 cycles, 
the mixture was incubated at 72°C for 4 min before it was stored at 4°C. The replacement 
binding loop was verified by DNA sequencing. The Nhe\ to BsfEII DNA fragment was 
released from plasmid by restriction digestion, purified and cloned into pSLGAMpR2 to 
create expression plasmid pSLGAMpR2-BBI (CK37281) in loopl (Q117). The expression 
plasmid was transformed into dgr246AGAP:pyr2, The transformation and shake flask testing 
of transformants were same as in Example 1 . More than thirty transformants were generated 
and forty-two transformants were assayed in shake flask. The SDS gel was used to check 
the level of protein expression. 

b. BBI with a-VEGF (CK37281) peptide in loop II: 

[169] For plasmid construction, obtaining fungal transformants and assaying fungal 
transformant in shake flasks, we following same procedures as described in example above, 
except the following two primers were used: 

5' CATGCAAGAGCTGTATCTGCtacaatctgtatggctggaccCAGTGTTTCTG3* (SEQ ID NO:48) 
5' GATGTCGACGCAGAAACACTGggtccagccatacagattgtaGCAGATACAG3'. (SEQ ID 

NO:49) 

c. BBI with a-VEGF (CK37281) peptide in loop I and II: 

[170] For plasmid construction, obtaining fungal transformants and assaying fungal 
transformant in shake flasks, we following same procedures, except the following four 
primers were used: 

5' GTTGCGATCAGTGCGCGTGTtacaatctgtatggctggaccTGTCGCTGCT 3' (SEQ ID NO:46) 
5' CGCATATCGGAGCAGCGACAggtccagccatacagattgtaACACGCGCAC 3' (SEQ ID NO:47) 
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5' CATGCAAGAGCTGTATCTGCtacaatctgtatggctggaccCAGTGTTTCTG3' (SEQ ID NO:48) 
5' GATGTCGACGCAGAAACACTGggtccagccatacagattgtaGCAGATACAG3*. (SEQ ID 

NO:49) 

d. BBI with a-complement protein c2 peptide in loop I: 

[1711 For plasmid construction, obtaining fungal transformants and assaying fungal 
transformant in shake flasks, we following same procedures, except the following two 
primers were used to introduce a peptide sequence that binds to c2 (denoted a-c2) to inhibit 
c2 function: 

5'GCGATCAGTGCAGCTGTagctgcggcaggaagatccccatccagtgcTGTCGCTGCTCCGATATGC 
GTC3' (SEQ ID NO:50) 

5'GAGCAGCGACAgcactggatggggatcttcctgccgcagctACAGCTGCACTGATCGCAACAGGGC 
TTA3' (SEQ IDNO:51) 

e. BBI with a-complement protein c3 peptide in loop I: 

[172] For plasmid construction, obtaining fungal transformants and assaying fungal 
transformant in shake flasks, we following same procedures, except the following two 
primers were used to introduce a peptide sequence that binds to c3 (denoted a-c3) to inhibit 
c3 function: 

5' GCGATCAGTGCGGCTGTgccaggagcaacctcgacgagTGTCGCTGCTCCGATATGCGTC 3' 
(SEQ ID NO:52) 

5* GAGCAGCGACActcgtcgaggttgctcctggcACAGCCGCACTGATCGCAACAGGGCTTA 3' 
(SEQ ID NO:53) 

f. BBI with a-complement protein c4 peptide in loop I: 

[1731 For plasmid construction, obtaining fungal transformants and assaying fungal 
transformant in shake flasks, we following same procedures, except the following two 
primers were used to introduce a peptide sequence that binds to c4 (denoted a-c4) to inhibit 
c4 function: 

5'GCGATCAGTGCGCGTGTcagagggccctccccatcctcTGTCGCTGCTCCGATATGCGTC 3" 
(SEQ ID NO:55) 

5' GAGCAGCGACAgaggatggggagggccctctgACACGCGCACTGATCGCAACAGGGCTTA 3' 
(SEQ ID NO:56) 
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g. BBI with a-complement protein c5 peptide in loop I: 

[1741 For plasmid construction, obtaining fungal transformants and assaying fungal 
transformant in shake flasks, we following same procedures, except the following two 
primers were used to introduce a peptide sequence that binds to c5 (denoted a-c5) to inhibit 
c5 function: 

5' GCGATCAGTGCCAGTGTggcaggctccacatgaagaccTGTCGCTGCTCCGATATGCGTC 3' 
(SEQ ID NO:57) 

5' GAGCAGCGACAggtcttcatgtggagcctgccACACTGGCACTGATCGCAACAGGGCTTAGA 3' 
(SEQ ID NO:58) 

h. BBI with a-human complement Factor B peptide in loop I: 

[175] For plasmid construction, obtaining fungal transformants and assaying fungal 
transformant in shake flasks, we following same procedures, except the following two 
primers were used to introduce a peptide sequence that binds to Factor B (denoted a-Factor 
B) to inhibit Factor B function: 

5' GCGATCAGTGCCAGTGTaagaggaagatcgtcctcgacTGTCGCTGCTCCGATATGCGTC 3' 
(SEQ ID NO:59) 

5' GAGCAGCGACAgtcgaggacgatcttcctcttACACTGGCACTGATCGCAACAGGGCTTAGA 3' 
(SEQ ID NO:60) 

i. BBI with a-Membrane Metalloprotease 2 (MMP2) peptide in loop I: 

[1761 For plasmid construction, obtaining fungal transformants and assaying fungal 
transformant in shake flasks, we following same procedures, except the following two 
primers were used to introduce a peptide sequence that binds to MMP2 (denoted a- MMP2) 
to inhibit MMP2 function: 

5' C AGTGCGC GTGTgccgccatgttcggccccgccTGTC GCTGCTCCGATATGC GTC 3' (SEQ ID 
NO:61) 

5' GAGCAGCGACAggcggggccgaacatggcggcACACGCGCACTGATCGCAACAG 3' (SEQ ID 
NO:62) 

j. BBI with a-Membrane Metalloprotease 12 (MMP12) peptide in loop I: 

(1771 For plasmid construction, obtaining fungal transformants and assaying fungal 

transformant in shake flasks, we following same procedures, except the following two 
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primers were used to introduce a peptide sequence that binds to MMP12 (denoted a- 
MMP12) to inhibit MMP1 2 function: 

5' CAGTGCGCGTGTggcgccctcggcctcttcggcTGTCGCTGCTCCGATATGCGTC 3' (SEQ ID 
NO:63) 

5' GAGCAGCGACAgccgaagaggccgagggcgccACACGCGCACTGATCGCAACAG 3' (SEQ ID 
NO:64) 

k. BBI with cotton binding peptide 2314 in loop I: 

[178] For plasmid construction, obtaining fungal transformants and assaying fungal 
transformant in shake flasks, we following same procedures, except the following two 
primers were used to introduce a peptide sequence that binds to cotton: 
5' GTTGCGATCAGTGCGCGTGTgagcccctgatccaccagcgcTGTCGCTGCT 3' (SEQ ID 

NO:65) 

5' CGCATATCGGAGCAGCGACAgcgctggtggatcaggggctcACACGCGCAC3' (SEQ ID 
NO:66) 

/. BBI with cotton binding peptide 2317 in loop I: 

[179] For plasmid construction, obtaining fungal transformants and assaying fungal 
transformant in shake flasks, we following same procedures, except the following two 
primers were used to introduce a peptide sequence that binds to cotton: 
5* GTTGCGATCAGTGCGCGTGTagcgccttccgcggccccaccTGTCGCTGCT3' (SEQ ID NO:67) 
5' CGCATATCGGAGCAGCGACAggtggggccgcggaaggcgctACACGCGCAC 3' (SEQ ID 

NO:68) 

m. BBI with compstatin loop in loop I: 

[180] For plasmid construction, obtaining fungal transformants and assaying fungal 
transformant in shake flasks, we following same procedures, except the following two 
primers were used to introduce the compstatin peptide sequence: 
5' GTTGCGATCAGTGCGCGTGTgttgttcaggactggggccaccaccgcTGTCGCTGCT (SEQ ID 

NO:69) 

5' CGCATATCGGAGCAGCGACAgcggtggtggccccagtcctgaacaacACACGCGCAC (SEQ ID 
NO:70) 

[1811 In this case, the 7 amino acids from the BBI Trypsin binding loop was replaced by 9 
amino acids from compstatin binding loops. 
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n. BBI with compstatin loop in loop II: 

[182] For plasmid construction, obtaining fungal transformants and assaying fungal 
transformant in shake flasks, we following same procedures, except the following two 
primers were used to introduce the compstatin peptide sequence: 
5' CATGCAAGAGCTGTATCTGCgttgttcaggactggggccaccaccgcTGTTTCTGCG (SEQ ID 

NO:71) 

5' GTGATGTCGACGCAGAAACAgcggtggtggccccagtcctgaacaacGCAGATACAG (SEQ ID 
NO:72) 

[183] In this case, the 7 amino acids from the BBI Trypsin binding loop was replaced by 9 
amino acids from compstatin binding loops. 

Example 5 

Expression of the Bowman-Birk Inhibitor and its Variants in Trichoderma reesei 
[184] Following procedures described in Example 2 above, the BBI-encoding DNA was 
optimized and used for this Example. Two primers were designed to amplify the DNA 
fragment using plasmid pSLGAMpR2-BBI or pSLGAMpR2-BBI with a-VEGF (CK37281) 
peptide in loop I and II as templates: 

5' GGA CTA GTA AGC GCG ACG ATG AGA GCT CT 3' (SEQ ID NO:73) 

5' AAG GCG CGC CTA GTT CTC CTT ATC GTC CT 3' (SEQ ID NO:74) 

A third primer was also used to create a PCR fragment which contains three glycine 

residues at the N-terminal of the BBI protein when used in conjunction with primer #2 (SEQ 

ID NO:74) above. 

5' GGA CTA GTA AGC GCG GCG GTG GCG ACG ATG AGA GCT CT 3' (SEQ ID NO:75). 
[185] Following the same procedures described in Example 2 above, the BBI-encoding 
DNA was optimized and used for this Example, the PCR fragment was cut with restriction 
enzyme Spel and AscI and ligated to the Trichoderma expression plasmid, pTrex4 (Figure 8) 
which is a modified version of pTREX2 (see Figure 9), which in turn is a modified version of 
pTEX, see PCT Publication No. WO 96/23928 for a complete description of the preparation 
of the pTEX vector, herein incorporated by reference, which contains a CBHI promoter and 
terminator for gene expression and a Trichoderma pyr4 gene as a selection marker for 
transformants, to create an expression plasmid. In the pTrex4 plasmid, the BBI gene was 
fused to the C-terminus of the CBH I core and linker from T. reesei. The amdS gene from A. 
nidulans was used as the selection marker during fungal transformation. The expression 
plasmid was transformed into Trichoderma reesei. Stable transformants were isolated on 
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Trichoderma minimal plates with acetamide as the nitrogen source. The transformants were 
grown on the amd minus plate which contains 1 ml/l 1000X salts, 20g/l Noble Agar, 1 .68g/l 
CsCI, 20g/l Glucose, 15g/l KH2P04, 0.6g/l MgS04*7H20, 0.6g/l CaCI2*2H20 and 0.6g/l 
Acetamide. The final pH was adjusted to 4.5. The 1000x salts contains 5g/l FeS04, 1 .6g/l 
MnS04, 1 .4g/l ZnS04 and 1g/l CoCIZ It was filter sterilized. After three days incubation at 
28°C, the transformants were transferred to the fresh amd minus plates and grown for 
another three days at 28°C. 

[186] The transformants were then inoculated into T. reesei proflo medium (50 ml for each 
transformant) in 250-ml shake flasks. T. reesei proflo medium contains 30g/l Alpha-lactose, 
6.5g/l (NH4)2S04, 2g/l KH2P04, 0.3g/l MgS04*7H20, 0.2g/l CaCI2, 1ml/l 1000x TRI Trace 
Salts, 2ml/l 10% Tween 80, 22.5g/l Proflo and 0.72g/l CaC03. The 1000x TRI Trace Salts 
contains 5g/l FeS04 * 7H20, 1 .6g/l MnS04 * H20 and 1 .4g/l ZnS04 * 7H20. After growing 
at 30°C for 2 days, 4 ml of culture was transferred into defined medium which contains 5g/l 
(NH4)2S04, 33g/l PIPPS buffer, 9g/l CASAMINO ACIDS, 4.5g/l KH2P04, 1g/l CACL2, 1g/l 
MgS04*7H20, 5ml/l MAZU and 2.5ml/l 400X T.reesei TRACE. Its pH was adjusted to 5.5 
and 40ml/1 40% lactose was added after sterilization. The 400X T.reesei TRACE contains 
175g/l Citric Acid (anhydrous), 200g/l FeS04 * 7H20, 16g/l ZnS04 * 7H20, 3.2g/l CuS04 * 
5H20, 1.4g/l MnS04 * H20 and 0.8g/l H3B03 (Boric Acid). 

[187] About 40 transformants were generated on the plates and 20 were assayed in shake 
flasks. The supernatant of the culture was used for SDS-PAGE analysis and assayed for 
trypsin or chymotrypsin inhibitory activity. Western blot also showed the presence of both 
fusion (Cbhl-BBI) and BBI alone. 

Example 6 

Co-Expression of the Bowman-Birk Inhibitor and Secretory Chaperones in Aspergillus 
[188] The following example details how secretion can be enhanced. STI protein contains 
two disulfide bonds and BBI contains 7 disulfide bonds in their tertiary structures and these 
disulfide bonds are important for their function. It is known that folding of protein with 
disulfide bonds require Protein Disulfide Isomerase (PDI) or other chaprones in ER. 
[189] Enhancement of STI or BBI expression was investigated by co-transformation of two 
plasmids or by sequential transformation of two piasmids, one contains STI or BBI 
expression cassette and the other one contains the PDI genes or chaperone genes. First, 
we co-transform plasmid pSLGAMpR2-BBI without 3G and histag (Q109) with plasmid Q51 
which contains 4.6 kb genomic DNA covering region of the pdiA gene from Aspergillus niger 
in vector pUC219 into same strain (dgr246AGAP:pyr2). Fifty-one transformants were 
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obtained and forty-seven transformants were screened in shake flasks. Transformant #14 
was selected because it produced the highest amount of BBI protein based on SDS gel data. 
The expression level of BBI protein is higher in the co-transformed stain than the strain 
containing only plasmid pSLGAMpR2-BBI without 3G and histag (Q109). Figure 7 illustrates 
the enhanced BBI expression. This strain was also spore purified and tested again in shake 
flask. 

[1901 Following procedures described above, we also decide to co-transform plasmid 
pSLGAMpR2-BBI without histag (Q108) with plasmid Q51 containing pdiA gene (same as 
above) into same strain (dgr246AGAP:pyr2). Thirty-four transformants were screened in 
shake flasks. One transformant was selected for its ability to produce BBI protein at the 
highest level based on the SDS gel date. The expression level of BBI protein is higher than 
the strain containing only plasmid pSLGAMpR2- BBI without histag (Q108). 
[191] Following procedures described above, we also decide to co-transform plasmid 
pSLGAMpR2-BBI without 3G and histag (Q109) with plasmid Q124 which contains 1623 bp 
genomic DNA covering region of the prpA gene from Aspergillus niger in vector pUC21 9 into 
same strain (dgr246AGAP:pyr2). Twenty-eight transformants were screened in shake 
flasks. One transformant was selected for its ability to produced the highest amount of BBI 
protein based on the SDS gel date. The expression level of BBI protein is higher in the co- 
transformed stain than the strain containing only plasmid pSLGAMpR2- BBI without 3G and 
histag (Q109). Figure 7 illustrates the enhancement (lane 15 vs lane 3). This strain was 
spore purified and tested again in shake flask. 

Example 7 

Recombinant Protease Inhibitor Variants Retain Activity 
[192] STl, BBI and variants thereof produced using the methods described above were 
tested for activity, e.g., inhibition of protease activity, 
a. Protease inhibition 

[193] 950 jjl! of Tris-buffered saline + 0.02% Tween 20 is combined with 20 \i\ protease 
(100jag/ml in 1mM HCI (bovine trypsin or chymotrypsin)) and 20 jal sample. The solution is 
mixed and incubated for 30 min. at room temperature. 10 |il substrate (for trypsin: succinyl- 
ala-ala-pro-arg-paranitroanilide, 10 mg/ml in DMSO; for chymotrypsin: succinyl ala-ala-pro- 
phe-paranitroanilide, 10 mg/ml in DMSO) is added and the solution mixed. Absorbance is 
monitored at 405nm and the rate determined (AWmin). The fraction of protease activity 
inhibited is determined by comparison with a control sample blank and calculated according 
to the following equation: 
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f A«»/mm{sample) \ „ , g) . ( MWinhibitoA = , w/ 

b. inhibition of HUVEC proliferation by aVEGF peptides. 

[194] HUVE cells (Cambrex, East Rutherford, NJ) were passaged 1 -5 times and 
maintained according to manufacturers instructions. HUVEC growth was stimulated by 0.03 
to 20 ng/ml VEGF with the highest proliferation at 10 ng/ml VEGF 16S (R&D systems); this 
concentration was used in subsequent experiments. A series of a-VEGF peptides (see 
Example 4) from 0.00052 uM to 25 uM and an anti-VEGF MAb control (R&D Systems) were 
mixed with 10 ng/mL VEGF prior to addition to HUVECs seeded in triplicate in 96-well 
plates. Cell proliferation was measured by 3 H-thymidine incorporation. Significant inhibition 
was observed (data not shown). 

[1951 It is understood that the examples and embodiments described herein are for 
illustrative purposes only and that various modifications or changes in light thereof will be 
suggested to persons skilled in the art and are to be included within the spirit and purview of 
this application and scope of the appended claims. All publications, patents, and patent 
applications cited herein are hereby incorporated by reference in their entirety for all 
purposes. 
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ABSTRACT OF THE DISCLOSURE 

[196] Described herein are protease inhibitors, variants thereof and methods for their 
production. 
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Figure 1A: Codon optimized BBI (SEQ ID N0:1) 
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Figure IB: 

GCTAGCAACG 


Codon optimized BBI with three glycine (SEQ 
TCATCTCCAA GCGCGGCGGT GGCGACGATG AGAGCTCTAA 


ID NO: 5 
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Figure 1C: Codon optimized BBI with three glycine at N- terminal end 
and six histidine residues at C- terminal end (SEQ ID NO: 76) 



GCTAGCAACG 


TCATCTCCAA 


GCGCGGCGGT 


GGCGACGATG 


AGAGCTCTAA 


50 
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GCCCTGTTGC 


GATCAGTGCG 


CGTGTACCAA 


ATCGAACCCT 


CCGCAGTGTC 


100 


GCTGCTCCGA 


TATGCGTCTG 


AATTCCTGTC 


ATAGCGCATG 


CAAGAGCTGT 
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ATCTGCGCCC 


TGAGCTACCC 


CGCGCAGTGT 


TTCTGCGTCG 


ACAT CACGGA 
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CTTCTGCTAC 


GAGCCGTGTA 


AGCCCAGCGA 


GGACGATAAG 


GAGAACCACC 
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ATCACCATCA 


CCACTAGCTC GAGGGTGACC 
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Figure 2: Codon Optimized STI (SEQ ID NO:3) 
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TftCTCGATAA 
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TGAAGGCAAC 


CCTCTTGAAA 


ATGGTGGCAC 


ATACTACATC 


CTGTCAGACA 
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TCACAGCATT 


TGGTGGAATC 


CGCGCAGCCC 


CTACGGGAAA 


TGAACGCTGC 
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CCTCTCACTG 


TGGTGCAATC 


T CGCAATGAG 


CTCGACAAAG 


GGATTGGAAC 
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AATCATCTCG 


TCCCCTTACC 


GAATCCGTTT 


TATCGCCGAA 


GGCCATCCTC 
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TGAGCCTTAA 


GTTCGATTCA 


TTTGCAGTTA 
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TTGGTTTCGC 
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TTTCTGATGA 
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AACTACAAGC 
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TCCTCAGCAA 


450 


GCTGAGGATG 


ACAAATGTGG 


GGATATTGGG 


ATTAGTATTG 


AT CATGATG A 


500 


TGGAACCAGG 


CGTCTGGTGG 


TGTCTAAGAA 


CAAACCGCTG 


GTGGTTCAGT 


550 


TTCAAAAACT 


TGATAAAGAA 


TCACTGCACC 


AT CAC CATCA 


CCACTAGCTC 


600 


GAGGGTGACC 










610 



I ####### 



Figure 3A: BBI amino acid sequence (SEQ ID NO : 7 ) 

DDESSKPCCD QCAC TKSNPP Q CRCSDMRLN SCHSACKSCI CALSYPAQCF 50 

CVDITDFCYE PCKPSEDDKE N 71 

Figure 3B: BBI with three glycine residues at N- terminal end 
(SEQ ID NO:8): 

GGGDDESSKP CCDQCAC TKS NPPQ CRCSDM RLNSCHSACK SCICALSYPA 50 

QCFCVDITDF CYEPCKPSED DKEN 74 



Figure 3C: BBI with three glycine residues at N-terminal end 
and six histidine residues at C- terminal end (SEQ ID NO:9) : 



GGGDDESSKP CCDQCAC TKS NPPQ CRCSDM RLNSCHSACK SCICALSYPA 50 
QCFCVDITDF CYEPCKPSED DKENHHHHHH 80 



Figure 4A: STI amino acid sequence (with glycine & His tag) 

(SEQIDNO:10) 

GGGDFVLDNE GNPLENGGTY YILSDITAFG GIRAAPTGNE RC PLTWQSR 50 

* ** 

NELDKGIGTI ISSPYRIRFI AEGHPLSLKF DSFAVIML CV GIPTEWSWE 100 

DLPEGPAVKI GENKDAMDGW FRLERVSDDE FNNYKLVFCP QQAEDDKCGD 150 

IGISIDHDDG TRRLWSKNK PLWQFQKLD KESLHHHHHH 19 0 

Figure 4B: STI amino acid sequence (without His tag) (SEQ ID 
NO: 11) 

GGGDFVLDNE GNPLENGGTY YILSDITAFG GIRAAPTGNE RC PLTWQSR 50 

* * * 

NELDKGIGTI ISSPYRIRFI AEGHPLSLKF DSFAVIML CV GIPTEWSWE 100 

DLPEGPAVKI GENKDAMDGW FRLERVSDDE FNNYKLVFCP QQAEDDKCGD 15 0 

IGISIDHDDG TRRLWSKNK PLWQFQKLD KESL 184 

Figure 4C; STI amino acid sequence (SEQ ID NO: 12) 

DFVLDNEGNP LENGGTYYIL SDITAFGGIR AAPTGNERC P LTWQSRNEL 50 

DKGIGTIISS PYRIRFIAEG HPLSLKFDSF AVIML CVGIP TEWSWEDLP 10 0 

EGPAVKIGEN KDAMDGWFRL ERVSDDEFNN YKLVFCPQQA EDDKCGDIGI 15 0 

SIDHDDGTRR LWS KNKPLV VQFQKLDKES L 181 
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FIGURE 9A: pTEX2 



AAGCTTAAGG TGCACGGCCC 
ACATGTCCGG TCGCGACGTA 
CGCCTGCAGC CACTTGCAGT 
TTTTGTAGGG TAGGAATTGT 
TCCCCCATAG AGTTCCCAAT 
TTGGGGAGAA GTTGACTTCC 
ATATAGGGTC GGCAACGGCA 
TGTTTGCGAT CTAACATCCA 
ACCACTTTGA TCTGCTGGTA 
GGTAAATCTA CACGTGGGCC 
GGTGCCATTC TTTTCCCTTC 
CGAGCTGTAA CTACCTCTGA 
CGTGCACCTG CATCATGTAT 
AGCAATGTGG GACTTTGATG 
TTGCAAAGTT TTGTTTCGGC 
TTCTGTGTAT TTTTGTGGCA 
CCAAGCTTGC TCTTTTGAGC 
TTGTGAAGTC GGTAATCCCG 
CTCCGAAGCT GCTGCGAACC 
AGCGAGCGGC TAAATTAGCA 
TTGTTGAATC ATGGCGTTCC 
GTAGCAGGCA CTCATTCCCG 
GAAC CGGAAT AATATAATAG 
ATGCAGGGGT ACTGAGCTTG 
CAACCTTTGG CGTTTCCCTG 
TATTAACCCA GACTGAC CGG 
TGTCATTGCG ATGTGTAATT 
GCCCGAATGT AGGATTGTTA 
AATCTGTGTC GGG CAGG AC A 
CCGATAGCAG TGTCTAGTAG 
GGAAAATACA AACCAATGGC 
T CAT AT AC CA GCGGCTAATA 
AATTTGCCAA CGGCTTGTGG 
CCCCACGTTT GTTTCTTCAC 
TTGGGTCGCT TGTTTGTTCC 
GTCTGACTCG GAGCGTTTTG 
TGAAATGTTG ACATTCAAGG 
GTGTAAGGAG GTTTGTCTGC 
GATGAAGTGG TCCATATTGA 
TTGAGTTGAA ACTGCCTAAG 
GTGTACATGT TTGTGCTCCG 
CACTGCTGCC TTTACCAAGC 
GGGGCCACTG CATGGTTTCG 
AG C C GAT AAA GATAGCCTCA 
GCGAATGTGT ATATATAAAG 
CCCATCTACT CAT C AAC T C A 
TGAGGCACAG AAACCCAATA 
TGCGAAAGCC TGACGCACCG 
GCGG CGGG AG CTACATGGCC 



ACGTGGCCAC TAGTACTTCT 
CGCGTATCGA TGGCGCCAGC 
CCCGTGGAAT TCTCACGGTG 
CACTCAAGCA CCCCCAACCT 
CAGTGAGT CA TGGCACTGTT 
GCCCAGAGCT GAAGGTCGCA 
AAAAAGCACG TGGCTCACCG 
GGAACCTGGA TACATCCATC 
AACTCGTATT CGCCCTAAAC 
CCTTTCGGTA TACTGCGTGT 
CTCTAGTGTT GAATTGTTTG 
ATCTCTGGAG AATGGTGGAC 
ATAATAGTGA TCCTGAGAAG 
GT CAT C AAAC AAAGAACGAA 
TACGGTGAAG AACTGGATAC 
AC AAGAGG C C AGAGACAATC 
TACAAGAACC TGTGGGGTAT 
CTGTATAGTA ATACGAGT CG 
CGGAGAATCG AGATGTGCTG 
TGAAAGGCTA TGAGAAATTC 
ATTCTTCGAC AAGCAAAGCG 
AAAAAACTCG GAGATTCCTA 
GCAATACATT GAGTTGCCTC 
GACATAACTG TTCCGTACCC 
ATTCAGCGTA CCCGTACAAG 
ACGTGTTTTG CCCTTCATTT 
TGCCTGCTTG ACCGACTGGG 
TCCGAACTCT G C T CG TAG AG 
CGCCTCGAAG GTTCACGGCA 
CAACCTGTAA AGCCGCAATG 
T AAAAG T AC A TAAGTTAATG 
ATTGTACAAT CAAGTGGCTA 
GGTTGCAGAA GCAACGGCAA 
TCAGTCCAAT CTCAGCTGGT 
GGTGAAGTGA AAGAAGACAG 
CATACAACCA AGGGCAGTGA 
AGTATTTAGC CAGGGATGCT 
CGATACGACG AATACTGTAT 
AATGTAAGTC GGCACTGAAC 
ATCTCGGGCC CTCGGGCCTT 
GGCAAATGCA AAGTGTGGTA 
AGCTGAGGGT ATGTGATAGG 
AATAGAAAGA GAAGCTTAGC 
TTAAACGGAA TGAGCTAGTA 
GTTCGAGGTC CGTGCCTCCC 
GATCCTCCAG GAGACTTGTA 
GTCAACCGCG GTTTAGGCGC 
GT AG AT T C T T GGTGAGCCCG 
CCGGGTGATT TATTTTTTTT 
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FIGURE 9B: pTEX2 



CTGACCCTTT TCAAATATAC 
CTGCTTGGTA TTGCGATGTT 
CACAAAACGA TTCCTTAGTA 
AAAGAGGAAA TTAAAAAAAA 
TAGAATCGCC GCTCTTCGTG 
GCTTGCATGC AAAGATACAC 
CATCCCAACT GGTACGTCAT 
CGCCTAAATA CAGCTGCATT 
TTCTCAGCTC CGTTTGTCCT 
CTTTCTTTAC TTCTTTCTTC 
GTGTAGGCTT TCCACGCTGC 
GTCTGAGGCC TTGAGGATGC 
CGATGCCAAT CAGCTTGTGC 
CCACCGACCG ATCCGTTGGT 
GCAGCCGGGC GTCATGTGGA 
TCTCCTGCGA GATGAAGCCC 
TCGACGCAGG CCTGCGTGTA 
CATTTGGGCG AGGATCAGGA 
C CGGGAAG AG CGACTCGTCG 
G AGG AG AC GG ACTCGTACTG 
GCCCTTGCGG CCGTCGCCGG 
CGCCAATGGA GCCCATGCCG 
GAGTCGGCGT CGTCGTCAAA 
GGACGTCTTG ACCTCGCAGG 
GGGCCAGCGA GGCCACCGAC 
ATGTGCGCCC AGTCGATGAT 
GACGGTGTGG CCAATGTCGC 
CGTGCTTGCG CGCCAGCGAC 
TGGAAGTCCC AGCCCGAGAC 
CGACGGGCCA ATCTTGTCGG 
CGTCGGCGCT CAGGCACAGG 
AGGTAAGCCG TCAGCGGGTG 
GGCCTTGAGC GTCGGGTGTG 
GAGGCTGCGG CTGGTTGGAT 
TTTAGAGGGG GGGAAAAAAA 
ATGACGTTGG AAGCGCGACA 
TCGGCGATTG GGAGAATTTC 
GTTGCTTTAA TGTCGGGCTC 
CTGGCAACGA GAG C AG AG C A 
CACGATGCCA AAAAGCTTGT 
TCTAGGGCTG AAACTGTGTT 
TGAATGGGGA ATGAGGAGCG 
AG AC GAG C CG CTTTGGCGGT 
CCAGGGCAAC ACG C AC T GAG 
TAGACACG CG CCGAGCAGAC 
CTGAATATTA GCACGCATGG 
ATCATAAGTA CGTATGTGCT 
CATGTTGTCT GTCATCCCCC 
TGGGGTTTTG CTGACTGAAT 



GGTCAACTCA TCTTTCACTG 
GTCAGCTTGG CAAATTGTGG 
GCCATGCATT TTAAGATAAC 
AAAAAAAACA AACATCCCGT 
TATCCCAGTA CCAGTTTAAA 
ATCAATCGCA GCTGGGGTAC 
AACAAAAATC GACAAGATGG 
CTATGATGCC GGGCTTTGGA 
CCCTCCCTTT TCCCCCTTCT 
CCTTCCCTCC CCTATCGCAG 
TGATCGGTAC CGCTCTGCCT 
CCCGGCCCAC AATGGCAATG 
GGCGTGTTGT ACTGCTGGCC 
CTGCTGGTCC TCGTCTTCGG 
TAAAGG CAT C GTCGGGCTCG 
ATGACAAAGT CCTTGTGCTC 
CTCCTTGTTC ATGAAGTTGC 
GGCCTCGGCT CAGCGGCGCC 
CCCTCGGCGA TGGCCTTTGT 
CTGGGTGACG GTGGTGATGG 
ACCGGTTCGA GTAGATGGGC 
TTGACGGCGC CGGCGGGCTC 
CG AGT C CAT G GTGGGCGTGC 
GGTAGCGCTC GAGCCAGCGC 
GCCTTGCCGG GCACCATGTT 
GCGCGCCGAC CCGCCCGTGT 
CAAACTTGCG GTCCTCGAAG 
GCCAGCTGGG CTCCCGTGCC 
CATGTCGTAG TGCGTCTTGA 
C C AGGT AC AG CAGCTCGCGC 
TTGGACGCCT TGAGGTCCAT 
CGTCGCCGTC TCGCTCCTGG 
GTGCCATGGC TGATGAGGCT 
AGTTTAACCC TTAGGGTGCC 
AGAGAGAGAT GGCACAATTC 
GCCGTGCGGG AGGAAGAGGA 
GTGCGATCCG AGTCGTCTCG 
GTCCCCTGGT CAAAATTCTA 
GCAGTAGTCG ATGCTAGAAA 
TCATTTCGGC TAGCCCGTGA 
GTTAATGTAT TATTGGCTGT 
CGATGGATTC GCTTGCATGT 
TTGTGATTCG AAGGTGTGTC 
CCAGCCAACA TGCATTGCTG 
ATAGGAGACG TGTTGACTGT 
TCTCAATAAG AGCAATAGGA 
TTTTCCTGCA AATGGTACGT 
ACTCAGGCTC TCATGATCAT 
GGATTCAGCC GCACGAAACA 
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FIGURE 9C: pTEX2 



CATG CAGAAG GGAAGCCCCC 
CGGAGAGCTG CCTAGTATGA 
TGAGCTCTGA AGCCGGGCAT 
GGACCCAAGA AGCTCTTGTC 
CGCTGGTTCT ACTTTGGCTC 
CTGCCAGAAT GTCTCTTGAT 
ACAACTGTTC ACCGATCAGG 
CACCTGCTCC GAAGAAGCAA 
AAAGAGCTCT ATCCACTTGA 
AGTCAACCCT GAAGTGGAAG 
GTTTGTCCCA GGACTGGGTG 
AGACCGACTT CAATTGGACC 
CCAAGAGCTC GGTTGCTTCT 
GCTCAAAGTA AAACAATTCA 
GGCTGAGGAG CAAGAGAGAG 
CCAAGTCTCA AACTGACTGC 
TAATATTCCG GAGTATACGT 
AGAACTATAG CTAGCATGCG 
CAGATCCATA TATAGGGCCC 
TGGCCATTCG AATTCGTAAT 
GTTATCCGCT CACAATTCCA 
AAAGCCTGGG GTGCCTAATG 
CTCACTGCCC GCTTTCCAGT 
GAATCGGCCA ACGCGCGGGG 
GCTTCCTCGC TCACTGACTC 
GGTATCAGCT CACTCAAAGG 
AT AACG CAGG AAAGAACATG 
CGTAAAAAGG CCGCGTTGCT 
CGAGCATCAC AAAAATCGAC 
GACTATAAAG ATACCAGGCG 
CCTGTTCCGA CCCTGCCGCT 
GGGAAGCGTG GCGCTTTCTC 
TGTAGGTCGT TCGCTCCAAG 
CCCGACCGCT GCGCCTTATC 
AAGACACGAC TTATCGC CAC 
GAGCGAGGTA TGTAGGCGGT 
TACGGCTACA C T AG AAGAAC 
AGTTACCTTC GGAAAAAGAG 
CCGCTGGTAG CGGTGGTTTT 
AAAAAAGGAT CTCAAGAAGA 
TCAGTGGAAC GAAAACTCAC 
AAAGGATCTT CAC C TAG AT C 
ATCTAAAGTA TATATGAGTA 
CAGTGAGGCA CCTATCTCAG 
CCTGACTCCC CGTCGTGTAG 
GGCCCCAGTG CTGCAATGAT 
TTTATCAGCA ATAAACCAGC 
CTGCAACTTT ATCCGCCTCC 
AGAGTAAGTA GTTCGCCAGT 



CCAGCCCCCT GTTCATAATT 
AGCAG CAATT GATAACGTTG 
ATGTATCACG TTTCTGCCTA 
ATAAGGTATT TATGAGTGTT 
AACCGCATCC CATAAGCTGA 
GTACAGCGAT CAACAACCGT 
GACGCGAAGA GGACCCAATC 
AAGGGCTATG AGGTGGTGCA 
CAAGGCCAAT GTCGCTCCCG 
TTTGCTTCTC TGATTAGTAT 
CAAATCCCGA AGACAGCTGG 
ACG C AT AC AG ATGGCCTCCA 
GTATATGTAC GACTCAGCAT 
TGGGCAATAT CGCGATGGGG 
GTAGGCCAAA CGCCAGACTC 
AGGCGGCCGC C AT ATG CAT C 
AGCCGGCTAA CGTTAACAAC 
CAAATTTAAA GCGCTGATAT 
GGGTTATAAT TACCTCAGGT 
CATGGT CAT A GCTGTTTCCT 
CACAACATAC GAG C CGGAAG 
AGTGAGCTAA CTCACATTAA 
CGGGAAACCT GTCGTGCCAG 
AGAGGCGGTT TGCGTATTGG 
GCTGCGCTCG GTCGTTCGGC 
CGGTAATACG GTTATCCACA 
TGAGCAAAAG GCCAGCAAAA 
GGCGTTTTTC CATAGGCTCC 
GCTCAAGTCA GAGGTGGCGA 
TTTCCCCCTG GAAGCTCCCT 
TAC CGGAT AC CTGTCCGCCT 
ATAGCTCACG CTGTAGGTAT 
CTGGGCTGTG TGCACGAACC 
CGGTAACTAT CGTCTTGAGT 
TGGCAGCAGC CACTGGTAAC 
GCTACAGAGT TCTTGAAGTG 
AGTATTTGGT ATCTGCGCTC 
TTGGTAGCTC TTGATCCGGC 
TTTGTTTGCA AGCAG CAGAT 
TCCTTTGATC TTTTCTACGG 
GTTAAGGGAT TTTGGTCATG 
CTTTTAAATT AAAAATGAAG 
AACTTGGTCT GACAGTTAC C 
CGATCTGTCT ATTTCGTTCA 
ATAACTACGA TACGGGAGGG 
ACCGCGAGAC CCACGCTCAC 
C AG C CGGAAG GGCCGAGCGC 
ATCCAGTCTA TTAATTGTTG 
TAATAGTTTG CGCAACGTTG 
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FIGURE 9D: pTEX2 



TACAGGCATC GTGGTGTCAC GCTCGTCGTT 
CCGGTTCCCA ACGATCAAGG CGAGTTACAT 
AAAGCGGTTA GCTCCTTCGG TCCTCCGATC 
CGCAGTGTTA TCACTCATGG TTATGGCAGC 
TCATGCCATC CGTAAGATGC TTTTCTGTGA 
TCATTCTGAG AATAGTGTAT GCGGCGACCG 
AATACGGGAT AATAC CGCGC CACATAGCAG 
TTGGAAAACG TTCTTCGGGG CGAAAACTCT 
AGATCCAGTT CGATGTAACC CACTCGTGCA 
TTTTACTTTC ACCAGCGTTT CTGGGTGAGC 
CCGCAAAAAA GGGAATAAGG GCGACACGGA 
TTCCTTTTTC AAT AT TAT TG AAGCATTTAT 
CGGATACATA TTTGAATGTA TTTAGAAAAA 
GCACATTTCC CCGAAAAGTG CCACCTGACG 
ATGACATTAA CCTATAAAAA TAGGCGTATC 
GCGTTTCGGT GATGACGGTG AAAACCTCTG 
CGGTCACAGC TTGTCTGTAA GCGGATGCCG 
GGCGCGTCAG CGGGTGTTGG CGGGTGTCGG 
AT C AG AGC AG ATTGTACTGA GAGTGCACCA 
TTTTGTTAAA ATTCGCGTTA AATTTTTGTT 
CAATAGGCCG AAATCGGCAA AATCCCTTAT 
GATAGGGTTG AGTGTTGTTC CAGTTTGGAA 
ACGTGGACTC CAACGTCAAA GGGCGAAAAA 
CCACTACGTG AACCATCACC CAAATCAAGT 
TAAAGCACTA AAT CGGAAC C CTAAAGGGAG 
GGGGAAAGCC GGCGAACGTG GCGAGAAAGG 
GCGGGCGCTA GGGCGCTGGC AAGTGTAGCG 
CACACCCGCC GCGCTTAATG CGCCGCTACA 
TTGACGTATG CGGTGTGAAA TACCGCACAG 
GCATCAGGCG CCATTCGCCA TTCAGGCTGC 
TCGGTGCGGG CCTCTTCGCT ATTACGCCAG 
TGCAAGGCGA TTAAGTTGGG TAACGCCAGG 
GTAAAACGAC GGCCAGTGCC 



TGGTATGGCT 


TCATTCAGCT 


7400 


GATCCCCCAT 


GTTGTGCAAA 


7450 


GTTGTCAGAA 


GTAAGTTGGC 


7500 


ACTGCATAAT 


TCTCTTACTG 


7550 


CTGGTGAGTA 


CTCAACCAAG 


7600 


AGTTGCTCTT 


GCCCGGCGTC 


7650 


AACTTTAAAA 


GTGCTCATCA 


7700 


CAAGGATCTT 


ACCGCTGTTG 


7750 


CCCAACTGAT 


CTTCAGCATC 


7800 


AAAAACAGGA 


AGGCAAAATG 


7850 


AATGTTGAAT 


ACTCATACTC 


7900 


CAGGGTTATT 


GTCTCATGAG 


7950 


TAAACAAATA 


GGGGTTCCGC 


8000 


TCTAAGAAAC 


CAT TAT TAT C 


8050 


ACGAGGCCCT 


TTCGTCTCGC 


8100 


ACACATGCAG 


CTCCCGGAGA 


8150 


GGAGCAGACA 


AGCCCGTCAG 


8200 


GGCTGGCTTA 


ACTATGCGGC 


8250 


TAAAATTGTA 


AACGTTAATA 


8300 


AAATCAGCTC 


ATTTTTTAAC 


8350 


AAATCAAAAG 


AATAGCCCGA 


8400 


CAAGAGTCCA 


CTATTAAAGA 


8450 


CCGTCTATCA 


GGGCGATGGC 


8500 


TTTTTGGGGT 


CGAGGTGCCG 


8550 


CCCCCGATTT 


AG AG C T TGAC 


8600 


AAGGGAAGAA 


AGCGAAAGGA 


8650 


GTCACGCTGC 


GCGTAACCAC 


8700 


GGGCGCGTAC 


TATGGTTGCT 


8750 


ATGCGTAAGG 


AG AAAAT AC C 


8800 


GCAACTGTTG 


GGAAGGGCGA 


8850 


CTGGCGAAAG 


GGGGATGTGC 


8900 


GTTTTCCCAG 


TCACGACGTT 


8950 
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